New Avatica Repository

The Apache Calcite PMC is pleased to announce further growth of its sub-project, Avatica.

Avatica has been slowly growing inside of Calcite for many years (dating back to Optiq-0.4.x!). The team has taken the next step to hoist the Avatica code out of the Calcite repository into its own. The team felt like this was the next logical step given the maturity of the project.

The previous “/avatica” directory in the Calcite repository has been removed, so further contributions should be submitted agains the new repository. The de-facto repository can be found at the ASF’s Git hosting, with a mirrored-copy also available on Github at apache/calcite-avatica.

Release 1.12.0

The Apache Calcite PMC is pleased to announce Apache Calcite release 1.12.0.

In 2½ months, 29 contributors have resolved 95 issues. Here are some of the highlights.

Calcite now supports JDK 9 and Guava 21.0. (It continues to run on JDK 7 and 8, and on versions of Guava as early as 14.0.1. The default version of Guava remains 19.0, due to the Cassandra adapter’s dependencies, and the fact that Guava 21.0 requires JDK 8 or later.)

There are two new adapters:

  • The File adapter can read files of various formats (such as CSV, JSON, zipped files, and HTML) over various protocols (including file and HTTP). If reading HTML files, it can extract data from nested <TABLE> elements.
  • The Pig adapter provides a SQL interface to Apache Pig.

And there are continuing improvements in performance and stability of the Druid adapter. (The Druid project now embeds Calcite to provide SQL support, and there has been cross-fertilization between the projects.)

To err is human, as the saying goes. If you mis-type the name of a schema, table or column in a SQL statement, Calcite now helps you correct it. The error message indicates whether it was whether it was the schema, table or column that was not found; if the mistake was just due to an upper- or lower-case letter, it suggests the correct name.

New SQL syntax and functions:

  • HOP, TUMBLE and SESSION functions in the GROUP BY clause allow you to aggregate over window types (especially useful for streaming queries);
  • Experimental support for the MATCH_RECOGNIZE clause for Complex-Event Processing (CEP);
  • New YEAR, MONTH, WEEK, DAYOFYEAR, DAYOFMONTH, DAYOFWEEK, HOUR, MINUTE, SECOND, DATABASE, IFNULL, and USER functions to comply with the ODBC/JDBC standard. Also, EXTRACT now allows the corresponding time-unit arguments.

See the release notes; download the release.

Release 1.11.0

The Apache Calcite PMC is pleased to announce Apache Calcite release 1.11.0.

Nearly three months after the previous release, there is a long list of improvements and bug-fixes, many of them making planner rules smarter. The following are some of the more important ones.

Several adapters have improvements:

  • The JDBC adapter can now push down DML (INSERT, UPDATE, DELETE), windowed aggregates (OVER), IS NULL and IS NOT NULL operators.
  • The Cassandra adapter now supports authentication.
  • Several key bug-fixes in the Druid adapter.

For correlated and uncorrelated sub-queries, we generate more efficient plans (for example, in some correlated queries we no longer require a sub-query to generate the values of the correlating variable), can now handle multiple correlations, and have also fixed a few correctness bugs.

New SQL syntax:

  • MINUS as a synonym for EXCEPT;
  • an AS JSON option for the EXPLAIN command;
  • compound identifiers in the target list of INSERT, allowing you to insert into individual fields of record-valued columns (or column families if you are using the Apache Phoenix adapter).

A variety of new and extended built-in functions: CONVERT, LTRIM, RTRIM, 3-parameter LOCATE and POSITION, RAND, RAND_INTEGER, and SUBSTRING applied to binary types.

There are minor but potentially breaking API changes in [CALCITE-1519] (interface SubqueryConverter becomes SubQueryConverter and some similar changes in the case of classes and methods) and [CALCITE-1530] (rename Shuttle to Visitor, and create a new class Visitor<R>). See the cases for more details.

See the release notes; download the release.

Release 1.9.0

The Apache Calcite PMC is pleased to announce Apache Calcite release 1.9.0.

This release includes extensions and fixes for the Druid adapter. New features were added, such as the capability to recognize and translate Timeseries and TopN Druid queries. Moreover, this release contains multiple bug fixes over the initial implementation of the adapter. It is worth mentioning that most of these fixes were contributed by Druid developers, which demonstrates the good reception of the adapter by that community.

We have added new SQL features too, e.g., support for LATERAL TABLE. There are multiple interesting extensions to the planner rules that should contribute to obtain better plans, such as avoiding doing the same join twice in the presence of COUNT DISTINCT, or being able to simplify the expressions in the plan further. In addition, we implemented a rule to convert predicates on EXTRACT function calls into date ranges. The rule is not specific to Druid; however, in principle, it will be useful to identify filter conditions on the time dimension of Druid data sources.

Finally, the release includes more than thirty bug-fixes, minor enhancements and internal changes to planner rules and APIs.

See the release notes; download the release.

Release 1.8.0

The Apache Calcite PMC is pleased to announce Apache Calcite release 1.8.0.

This release adds adapters for Elasticsearch and Druid. It is also now easier to make a JDBC connection based upon a single adapter.

There are several new SQL features: UNNEST with multiple arguments, MAP arguments and with a JOIN; a DESCRIBE statement; and a TRANSLATE function like the one in Oracle and PostgreSQL.

We also added support for SELECT without FROM (equivalent to the VALUES clause, and widely used in MySQL and PostgreSQL), and added a conformance parameter to allow you to selectively enable this and other SQL features.

And, as usual, there are a couple of dozen bug-fixes and enhancements to planner rules and APIs.

See the release notes; download the release.

Cassandra Adapter

A new Apache Calcite adapter allows you to access Apache Cassandra via industry-standard SQL.

You can map a Cassandra keyspace into Calcite as a schema, Cassandra CQL tables as tables, and execute SQL queries on them, which Calcite converts into CQL. Cassandra can define and maintain materialized views but the adapter goes further: it can transparently rewrite a query to use a materialized view even if the view is not mentioned in the query.

Read more about the adapter here.

The Cassandra adapter is available as part of Apache Calcite version 1.7.0, which has just been released. Calcite also has adapters for CSV and JSON files, and JDBC data source, MongoDB, Spark and Splunk.

Release 1.7.0

Apache Calcite 1.7.0 is the first release since Avatica became an independent project. Calcite now depends on Avatica in the same way as it does other libraries, via a Maven dependency. To see Avatica-related changes, see the release notes for Avatica 1.7.1.

We have added an adapter for Apache Cassandra. You can map a Cassandra keyspace into Calcite as a schema, Cassandra CQL tables as tables, and execute SQL queries on them, which Calcite converts into CQL. Cassandra can define and maintain materialized views but the adapter goes further: it can transparently rewrite a query to use a materialized view even if the view is not mentioned in the query.

This release adds an Oracle-compatibility mode. If you add fun=oracle to your JDBC connect string, you get all of the standard operators and functions plus Oracle-specific functions DECODE, NVL, LTRIM, RTRIM, GREATEST and LEAST. We look forward to adding more functions, and compatibility modes for other databases, in future releases.

We’ve replaced our use of JUL (java.util.logging) with SLF4J. SLF4J provides an API which Calcite can use independent of the logging implementation. This ultimately provides additional flexibility to users, allowing them to configure Calcite’s logging within their own chosen logging framework. This work was done in [CALCITE-669].

For users experienced with configuring JUL in Calcite previously, there are some differences as some the JUL logging levels do not exist in SLF4J: FINE, FINER, and FINEST, specifically. To deal with this, FINE was mapped to SLF4J’s DEBUG level, while FINER and FINEST were mapped to SLF4J’s TRACE.

See the release notes; download the release.

Streaming SQL in Samza

Julian Hyde gave a talk at the Apache Samza meetup in Mountain View, CA.

His talk asked the questions:

  • What is SamzaSQL, and what might I use it for?
  • Does this mean that Samza is turning into a database?
  • What is a query optimizer, and what can it do for my streaming queries?

The talk is available in [slides] and [video].

Calcite appoints Josh Elser to PMC

The Apache Calcite project management committee (PMC) today announced the appointment of Josh Elser to the committee.

Josh has only been a committer for a few months, but has become a prominent member of the Calcite project, and has taken leadership in several areas, not least in discussing the future of Avatica.

Release 1.6.0

As usual in this release, there are new SQL features, improvements to planning rules and Avatica, and lots of bug fixes. We’ll spotlight a couple of features make it easier to handle complex queries.

[CALCITE-816] allows you to represent sub-queries (EXISTS, IN and scalar) as RexSubQuery, a kind of expression in the relational algebra. Until now, the sql-to-rel converter was burdened with expanding sub-queries, and people creating relational algebra directly (or via RelBuilder) could only create ‘flat’ relational expressions. Now we have planner rules to expand and de-correlate sub-queries.

Metadata is the fuel that powers query planning. It includes traditional query-planning statistics such as cost and row-count estimates, but also information such as which columns form unique keys, unique and what predicates are known to apply to a relational expression’s output rows. From the predicates we can deduce which columns are constant, and following [CALCITE-1023] we can now remove constant columns from GROUP BY keys.

Metadata is often computed recursively, and it is hard to safely and efficiently calculate metadata on a graph of RelNodes that is large, frequently cyclic, and constantly changing. [CALCITE-794] introduces a context to each metadata call. That context can detect cyclic metadata calls and produce a safe answer to the metadata request. It will also allow us to add finer-grained caching and further tune the metadata layer.

See the release notes; download the release.

Release 1.5.0

This is our first release as a top-level Apache project! Thanks to everyone who has contributed to it.

In addition to a large number of bug fixes and minor enhancements, this release includes major improvements to Avatica, planner rules, and RelBuilder.

Further, we built Piglet, a subset of the classic Hadoop language Pig. Pig is particularly interesting because it makes heavy use of nested multi-sets. You can follow this example to implement your own query language, and immediately taking advantage of Calcite’s back-ends and optimizer rules.

See the release notes; download the release.

Calcite Graduates

On October 21st, 2015 the board of the Apache Software Foundation voted to establish Calcite as a top-level Apache project.

Calcite's graduation cake

Describing itself as “the foundation for your next high-performance database”, Calcite is a framework for building data management systems. Calcite includes a comprehensive implementation of relational algebra and an extensible cost-based query optimizer. It also includes an optional SQL parser and JDBC driver.

Calcite joined Apache as an incubator project in May, 2014. To graduate from the incubator, projects have to prove that they can create high quality releases, form a diverse community, and operate as a meritocracy.

Calcite’s committers have delivered eight releases during incubation (roughly one every two months) including the milestone 1.0 release in January, 2015.

The project has become a key component in many high-performance databases, including the Apache Drill, Apache Hive, Apache Kylin and Apache Phoenix open source projects, and several commercial products.

Also, in collaboration with Apache Samza and Apache Storm, Calcite is developing streaming extensions to standard SQL.

The Calcite community met at a hangout on October 27th, 2015, and celebrated with a graduation cake.

XLDB 2015 best lightning talk

Julian Hyde’s talk Apache Calcite: One planner fits all won Best Lightning Talk at the XLDB-2015 conference (with Eric Tschetter’s talk “Sketchy Approximations”).

XLDB is an annual conference that brings together experts from science, industry and academia to find practical solutions to problems involving extremely large data sets.

As a result of winning Best Lightning Talk, Julian will get a 30 minute keynote speaking slot at XLDB-2016.

The talk is available in slides and video.

Algebra builder

Calcite’s foundation is a comprehensive implementation of relational algebra (together with transformation rules, cost model, and metadata) but to create algebra expressions you had to master a complex API.

We’re solving this problem by introducing an algebra builder, a single class with all the methods you need to build any relational expression.

For example,

final FrameworkConfig config;
final RelBuilder builder = RelBuilder.create(config);
final RelNode node = builder
      builder.count(false, "C"),
      builder.sum(false, "S", builder.field("SAL")))

creates the algebra

LogicalFilter(condition=[>($1, 10)])
  LogicalAggregate(group=[{7}], C=[COUNT()], S=[SUM($5)])
    LogicalTableScan(table=[[scott, EMP]])

which is equivalent to the SQL

SELECT deptno, count(*) AS c, sum(sal) AS s
FROM emp
GROUP BY deptno
HAVING count(*) > 10

The algebra builder documentation describes the full API and has lots of examples.

We’re still working on the algebra builder, but plan to release it with Calcite 1.4 (see [CALCITE-748]).

The algebra builder will make some existing tasks easier (such as writing planner rules), but will also enable new things, such as writing applications directly on top of Calcite, or implementing non-SQL query languages. These applications and languages will be able to take advantage of Calcite’s existing back-ends (including Hive-on-Tez, Drill, MongoDB, Splunk, Spark, JDBC data sources) and extensive set of query-optimization rules.

If you have questions or comments, please post to the mailing list.

Calcite adds 5 committers

The Calcite project management committee today added five new committers for their work on Calcite. Welcome all!

  • Aman Sinha
  • Jesús Camacho-Rodríguez
  • Jinfeng Ni
  • John Pullokkaran
  • Nick Dimiduk

Release 1.2.0 Incubating

A short release, less than a month after 1.1.

There have been many changes to Avatica, hugely improving its coverage of the JDBC API and overall robustness. A new provider, JdbcMeta, allows you to remote an existing JDBC driver.

[CALCITE-606] improves how the planner propagates traits such as collation and distribution among relational expressions.

[CALCITE-613] and [CALCITE-307] improve implicit and explicit conversions in SQL.

See the release notes; download the release.

Release 1.1.0 Incubating

This Calcite release makes it possible to exploit physical properties of relational expressions to produce more efficient plans, introducing collation and distribution as traits, Exchange relational operator, and several new forms of metadata.

We add experimental support for streaming SQL.

This release drops support for JDK 1.6; Calcite now requires 1.7 or later.

We have introduced static create methods for many sub-classes of RelNode. We strongly suggest that you use these rather than calling constructors directly.

See the release notes; download the release.

Release 1.0.0 Incubating

Calcite’s first major release.

Since the previous release we have re-organized the into the org.apache.calcite namespace. To make migration of your code easier, we have described the mapping from old to new class names as an attachment to [CALCITE-296].

The release adds SQL support for GROUPING SETS, EXTEND, UPSERT and sequences; a remote JDBC driver; improvements to the planner engine and built-in planner rules; improvements to the algorithms that implement the relational algebra, including an interpreter that can evaluate queries without compilation; and fixes about 30 bugs.

See the release notes; download the release.

Release 0.9.2 Incubating

A fairly minor release, and last release before we rename all of the packages and lots of classes, in what we expect to call 1.0. If you have an existing application, it’s worth upgrading to this first, before you move on to 1.0.

See the release notes; download the release.

Calcite Twitter

The official @ApacheCalcite Twitter account pushes announcements about Calcite. If you give a talk about Calcite, let us know and we'll tweet it out and add it to the news section of the website.