Pig Latin Operators and Statements – A Complete Guide
1. Objective
In our previous blog, we have seen Apache Pig introduction and pig architecture in detail. Now this article covers the basics of Pig Latin Operators such as comparison, general and relational operators. Moreover, we will also cover the type construction operators as well. We will also discuss the Pig Latin statements in this blog with an example.
2. What is Pig Latin?
Pig Latin is the language which analyzes the data in Hadoop using Apache Pig. An interpreter layer transforms Pig Latin statements into MapReduce jobs. Then Hadoop process these jobs further. Pig Latin is a simple language with SQL like semantics. Anyone can use it in a productive manner. Latin has a rich set of functions. These functions exhibit data manipulation. Furthermore, they are extensible by writing user-defined functions (UDF) using java.
3. Pig Latin Operators
a. Arithmetic Operators
These pig latin operators are basic mathematical operators.
Operator | Description | Example |
+ | Addition − It add values on any single side of the operator. | if a= 10, b= 30, a + b gives 40 |
− | Subtraction − It reduces the value of right hand operand from left hand operand. | if a= 40, b= 30, a-b gives 10 |
* | Multiplication − This operation multiplies the values on either side of the operator. | a * b gives you 1200 |
/ | Division − This operator divides the left hand operand by right hand operand. | if a= 40, b= 20, b / a results to 2 |
% | Modulus − It divides the left hand operand by right hand operand with remainder as result. | if a= 40, b= 30, b%a results to 10 |
? : | Bincond − It evaluates the Boolean operators. Moreover, it has three operands below. variable x = (expression) ? value1 if true : value2 if false. | b = (a == 1)? 40: 20; if a = 1 the value is 40. if a!=1 the value is 20. |
CASE WHEN THEN ELSE END | Case − This operator is equal to the nested bincond. | CASE f2 % 4 WHEN 0 THEN ‘even’ WHEN 1 THEN ‘odd’ END |
b. Comparison Operators
This table contains the comparison operators of Pig Latin.
Operator | Description | Example |
== | Equal − This operator checks whether the values of two operands are equal or not. If yes, then the condition becomes true. | If a=10, b=20, then (a = b) is not true |
!= | Not Equal − Checks the values of two operands are equal or not. If the values are equal, then condition becomes false else true. | If a=10, b=20, then (a != b) is true |
> | Greater than − It checks whether the right operand value is greater than that of the right operand. If yes, then the condition becomes true. | If a=10, b=20, then(a > b) is not true. |
< | Less than − This operator checks the value of the left operand is less than the right operand. If condition fulfills, then it returns true. | (a < b) is true, if a=10, b=20. |
>= | Greater than or equal to − It checks the value of the left operand with right hand. It checks whether it is greater or equal to the right operand. If yes, then it returns true. | If a=20, b=50, true(a >= b) is not true. |
<= | Less than or equal to − The value of the left operand is less than or equal to that of the right operand. Then the condition still returns true. | If a=20, b=20, (a <= b) is true. |
matches | Pattern matching − This checks the string in the left-hand matches with the constant in the RHS. | f1 matches ‘.*df.*’ |
c. Type Construction Operators
The above table describes the Type construction pig latin operators.
Operator | Description | Example |
() | Tuple constructor operator − This operator constructs a tuple. | (Dataflair, 20) |
{} | Bag constructor operator − To construct a bag, we use this operator. | {(Dataflair, 10), (training, 25)} |
[] | Map constructor operator − This operator construct a tuple. | [name#DF, age#12] |
d. Relational Operations
The above table describes the relational operators of Pig Latin.
Operator | Description |
Loading and Storing | |
LOAD | It loads the data from a file system into a relation. |
STORE | It stores a relation to the file system (local/HDFS). |
Filtering | |
FILTER | There is a removal of unwanted rows from a relation. |
DISTINCT | We can remove duplicate rows from a relation by this operator. |
FOREACH, GENERATE | It transforms the data based on the columns of data. |
STREAM | To transform a relation using an external program. |
Grouping and Joining | |
JOIN | We can join two or more relations. |
COGROUP | There is a grouping of the data into two or more relations. |
GROUP | It groups the data in a single relation. |
CROSS | We can create the cross product of two or more relations. |
Sorting | |
ORDER | It arranges a relation in an order based on one or more fields. |
LIMIT | We can get a particular number of tuples from a relation. |
Combining and Splitting | |
UNION | We can combine two or more relations into one relation. |
SPLIT | To split a single relation into more relations. |
Diagnostic Operators | |
DUMP | It prints the content of a relationship through the console. |
DESCRIBE | It describes the schema of a relation. |
EXPLAIN | We can view the logical, physical execution plans to evaluate a relation. |
ILLUSTRATE | It displays all the execution steps as the series of statements. |
4. Pig Latin – Statements
The statements are the basic constructs while processing data using Pig Latin.
- The statements can work with relations including expressions and schemas.
- However, every statement terminate with a semicolon (;).
- We will perform different operations using Pig Latin operators.
- Pig Latin statements inputs a relation and produces some other relation as output.
- The semantic checking initiates as we enter a Load step in the Grunt shell. We use the Dump operator to view the contents of the schema. The MapReduce job initiates for loading the data into the file system. It performs only after the dump operation.
For Example
Following is a Pig Latin statement, it loads the data to Apache Pig.
grunt> Sample_data = LOAD 'sample_data.txt' USING PigStorage(',')as ( id:int, name:chararray, contact:chararray, city:chararray );
So, this was all in Pig Latin Tutorial. Hope you like our explanation.
5. Conclusion
Thus, in this Pig Latin Tutorial, we discussed the Pig Latin language analyzes the data in Hadoop. Also, it transforms the statements into further MapReduce jobs. It also has a certain set of data manipulation functions. At last, the Pig Latin statements are the constructs for data processing.
See Also-