Pig adapter

Overview

The Pig adapter allows you to write queries in SQL and execute them using Apache Pig.

A simple example

Let’s start with a simple example. First, we need a model definition, as follows.

{
  "version": "1.0",
  "defaultSchema": "SALES",
  "schemas": [ {
    "name": "PIG",
    "type": "custom",
    "factory": "org.apache.calcite.adapter.pig.PigSchemaFactory",
    "tables": [ {
      "name": "t",
      "type": "custom",
      "factory": "org.apache.calcite.adapter.pig.PigTableFactory",
      "operand": {
        "file": "data.txt",
        "columns": ["tc0", "tc1"]
      }
    }, {
      "name": "s",
      "type": "custom",
      "factory": "org.apache.calcite.adapter.pig.PigTableFactory",
      "operand": {
        "file": "data2.txt",
        "columns": ["sc0", "sc1"]
      }
    } ]
  } ]
}

Now, if you write the SQL query

select *
from "t"
join "s" on "tc1" = "sc0"

the Pig adapter will generate the Pig Latin script

t = LOAD 'data.txt' USING PigStorage() AS (tc0:chararray, tc1:chararray);
s = LOAD 'data2.txt' USING PigStorage() AS (sc0:chararray, sc1:chararray);
t = JOIN t BY tc1, s BY sc0;

which is then executed using Pig’s runtime, typically MapReduce on Apache Hadoop.

Relationship to Piglet

Calcite has another component called Piglet. It allows you to write queries in a subset of Pig Latin, and execute them using any applicable Calcite adapter. So, Piglet is basically the opposite of the Pig adapter.