Druid is a fast column-oriented distributed data
store. It allows you to execute queries via a
JSON-based query language,
in particular OLAP-style queries.
Druid can be loaded in batch mode or continuously; one of Druid’s key
differentiators is its ability to
load from a streaming source such as Kafka
and have the data available for query within milliseconds.
Calcite’s Druid adapter allows you to query the data using SQL,
combining it with data in other Calcite schemas.
First, we need a
The model gives Calcite the necessary parameters to create an instance
of the Druid adapter.
A basic example of a model file is given below:
This file is stored as druid/src/test/resources/druid-wiki-model.json,
so you can connect to Druid via
That query shows the top 5 countries of origin of wiki page edits
on 2015-09-12 (the date covered by the wikiticker data set).
Now let’s see how the query was evaluated:
That plan shows that Calcite was able to push down the GROUP BY
part of the query to Druid, including the COUNT(*) function,
but not the ORDER BY ... LIMIT. (We plan to lift this restriction;
Foodmart data set
The test VM also includes a data set that denormalizes
the sales, product and customer tables of the Foodmart schema
into a single Druid data set called “foodmart”.
You can access it via the
Simplifying the model
If less metadata is provided in the model, the Druid adapter can discover
it automatically from Druid. Here is a schema equivalent to the previous one
but with dimensions, metrics and timestampColumn removed:
Calcite dispatches a
to Druid to discover the columns of the table.
Now, let’s take out the tables element:
Calcite discovers the “wikiticker” data source via the
REST call. Now that the “wiki” table element is removed, the table is called
“wikiticker”. Any other data sources present in Druid will also appear as
Our model is now a single schema based on a custom schema factory with only two
operands, so we can
dispense with the model
and supply the operands as part of the connect string: