Cassandra adapter

For instructions on downloading and building Calcite, start with the tutorial.

Once you’ve managed to compile the project, you can return here to start querying Cassandra with Calcite. First, we need a model definition. The model gives Calcite the necessary parameters to create an instance of the Cassandra adapter. Note that while models can contain definitions of materializations, the adapter will attempt to automatically populate any materialized views defined in Cassandra.

A basic example of a model file is given below:

{
  version: '1.0',
  defaultSchema: 'twissandra',
  schemas: [
    {
      name: 'twissandra',
      type: 'custom',
      factory: 'org.apache.calcite.adapter.cassandra.CassandraSchemaFactory',
      operand: {
        host: 'localhost',
        keyspace: 'twissandra'
      }
    }
  ]
}

Note that you can also specify username and password keys along with the host and keyspace if your server requires authentication. Assuming this file is stored as model.json, you can connect to Cassandra via sqlline as follows:

$ ./sqlline
sqlline> !connect jdbc:calcite:model=model.json admin admin

sqlline will now accept SQL queries which access your CQL tables. However, you’re not restricted to issuing queries supported by CQL. Calcite allows you to perform complex operations such as aggregations or joins. The adapter will attempt to compile the query into the most efficient CQL possible by exploiting filtering and sorting directly in Cassandra where possible.

For example, in the example dataset there is a CQL table named timeline with username as the partition key and time as the clustering key.

We can issue a simple query to fetch the most recent tweet ID of the user by writing standard SQL:

sqlline> SELECT "tweet_id"
         FROM "timeline"
         WHERE "username" = 'JmuhsAaMdw'
         ORDER BY "time" DESC LIMIT 1;
+--------------------------------------+
| tweet_id                             |
+--------------------------------------+
| f3d3d4dc-d05b-11e5-b58b-90e2ba530b12 |
+--------------------------------------+

While executing this query, the Cassandra adapter is able to recognize that username is the partition key and can be filtered by Cassandra. It also recognizes the clustering key time and pushes the ordering to Cassandra as well.

The final CQL query given to Cassandra is below:

SELECT username, time, tweet_id
FROM "timeline"
WHERE username = 'JmuhsAaMdw'
ORDER BY time DESC ALLOW FILTERING;

There is still significant work to do in improving the flexibility and performance of the adapter, but if you’re looking for a quick way to gain additional insights into data stored in Cassandra, Calcite should prove useful.