Pig adapter
Overview
The Pig adapter allows you to write queries in SQL and execute them using Apache Pig.
A simple example
Let’s start with a simple example. First, we need a model definition, as follows.
{
"version": "1.0",
"defaultSchema": "SALES",
"schemas": [ {
"name": "PIG",
"type": "custom",
"factory": "org.apache.calcite.adapter.pig.PigSchemaFactory",
"tables": [ {
"name": "t",
"type": "custom",
"factory": "org.apache.calcite.adapter.pig.PigTableFactory",
"operand": {
"file": "data.txt",
"columns": ["tc0", "tc1"]
}
}, {
"name": "s",
"type": "custom",
"factory": "org.apache.calcite.adapter.pig.PigTableFactory",
"operand": {
"file": "data2.txt",
"columns": ["sc0", "sc1"]
}
} ]
} ]
}
Now, if you write the SQL query
select *
from "t"
join "s" on "tc1" = "sc0"
the Pig adapter will generate the Pig Latin script
t = LOAD 'data.txt' USING PigStorage() AS (tc0:chararray, tc1:chararray);
s = LOAD 'data2.txt' USING PigStorage() AS (sc0:chararray, sc1:chararray);
t = JOIN t BY tc1, s BY sc0;
which is then executed using Pig’s runtime, typically MapReduce on Apache Hadoop.
Relationship to Piglet
Calcite has another component called Piglet. It allows you to write queries in a subset of Pig Latin, and execute them using any applicable Calcite adapter. So, Piglet is basically the opposite of the Pig adapter.