Elasticsearch adapter

For instructions on downloading and building Calcite, start with the tutorial.

Once you’ve managed to compile the project, you can return here to start querying Elasticsearch with Calcite. First, we need a model definition. The model gives Calcite the necessary parameters to create an instance of the Elasticsearch adapter. The models can contain definitions of materializations. The name of the tables defined in the model definition corresponds to types in Elasticsearch. The schema/database is represented by the index parameter in the model definition.

A basic example of a model file is given below:

{
  "version": "1.0",
  "defaultSchema": "elasticsearch",
  "schemas": [
    {
      "type": "custom",
      "name": "elasticsearch",
      "factory": "org.apache.calcite.adapter.elasticsearch.ElasticsearchSchemaFactory",
      "operand": {
        "coordinates": "{'127.0.0.1': 9300}",
        "userConfig": "{'bulk.flush.max.actions': 10, 'bulk.flush.max.size.mb': 1}",
        "index": "usa"
      }
    }
  ]
}

Assuming this file is stored as model.json, you can connect to Elasticsearch via sqlline as follows:

$ ./sqlline
sqlline> !connect jdbc:calcite:model=model.json admin admin

sqlline will now accept SQL queries which access your Elasticsearch types. The purpose of this adapter is to compile the query into the most efficient Elasticsearch SEARCH JSON possible by exploiting filtering and sorting directly in Elasticsearch where possible.

For example, in the example dataset there is an Elasticsearch type named zips under index named usa.

We can issue a simple query to fetch the names of all the states stored in the type zips. By default, Elasticsearch returns only 10 rows:

sqlline> SELECT * from "zips";
_MAP={pop=13367, loc=[-72.505565, 42.067203], city=EAST LONGMEADOW, id=01028, state=MA}
_MAP={pop=1652, loc=[-72.908793, 42.070234], city=TOLLAND, id=01034, state=MA}
_MAP={pop=3184, loc=[-72.616735, 42.38439], city=HATFIELD, id=01038, state=MA}
_MAP={pop=43704, loc=[-72.626193, 42.202007], city=HOLYOKE, id=01040, state=MA}
_MAP={pop=2084, loc=[-72.873341, 42.265301], city=HUNTINGTON, id=01050, state=MA}
_MAP={pop=1350, loc=[-72.703403, 42.354292], city=LEEDS, id=01053, state=MA}
_MAP={pop=8194, loc=[-72.319634, 42.101017], city=MONSON, id=01057, state=MA}
_MAP={pop=1732, loc=[-72.204592, 42.062734], city=WALES, id=01081, state=MA}
_MAP={pop=9808, loc=[-72.258285, 42.261831], city=WARE, id=01082, state=MA}
_MAP={pop=4441, loc=[-72.203639, 42.20734], city=WEST WARREN, id=01092, state=MA}

While executing this query, the Elasticsearch adapter is able to recognize that city can be filtered by Elasticsearch and state can be sorted by Elasticsearch in ascending order.

The final source json given to Elasticsearch is below:

{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "city": "springfield"
              }
            }
          ]
        }
      }
    }
  },
  "fields": [
    "city",
    "state"
  ],
  "script_fields": {},
  "sort": [
    {
      "state": "asc"
    }
  ]
}

This is the initial version of the Calcite Elasticsearch adapter. Work is in progress to introduce new features like aggregations into it.