What is Pig Latin and its Operators

Boost your career with Free Big Data Courses!!

Apache Pig offers High-level language like Pig Latin to perform data analysis programs. So, in this Pig Latin tutorial, we will discuss the basics of Pig Latin. Such as Pig Latin statements, data types, general operators, and Pig Latin UDF in detail. Also, we will see its examples to understand it well.

So, let’s start the Pig Latin Tutorial.

What is Pig Latin?

While we need to analyze data in Hadoop using Apache Pig, we use Pig Latin language. Basically, first, we need to transform Pig Latin statements into MapReduce jobs using an interpreter layer. In this way, the Hadoop process these jobs.

However, we can say, Pig Latin is a very simple language with SQL like semantics. It is possible to use it in a productive manner. It also contains a rich set of functions. Those exhibits data manipulation.

Moreover,  by writing user-defined functions (UDF) using  Java, we can extend them easily. That implies they are extensible in nature.

Data Model in Pig Latin

The data model of Pig is fully nested. In addition, the outermost structure of the Pig Latin data model is a Relation. Also, it is a bag. While−

  • A bag, what we call a collection of tuples.
  • A tuple, what we call an ordered set of fields.
  • A field, what we call a piece of data.

Statements in Pig Latin

Also, make sure, statements are the basic constructs while processing data using Pig Latin.

  • Basically,  statements work with relations. Also,  includes expressions and schemas.
  • Here, every statement ends with a semicolon (;).
  • Moreover, through statements, we will perform several operations using operators, those are offered by Pig Latin.
  • However, Pig Latin statements take a relation as input and produce another relation as output, while performing all other operations Except LOAD and STORE.
  • Its semantic checking will be carried out, once we enter a Load statement in the Grunt shell. Although, we need to use the Dump operator, in order to see the contents of the schema. Because, the MapReduce job for loading the data into the file system will be carried out, only after performing the dump operation.

Pig Latin Example –
Here, is a Pig Latin statement. Basically, that loads data to Apache Pig.

grunt> Employee_data = LOAD 'Employee_data.txt' USING PigStorage(',')as
  ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );

Pig Latin Datatypes

Further, is the list of Pig Latin data types. Such as:

  • int

“Int” represents a signed 32-bit integer.
For Example: 10

  • long

It represents a signed 64-bit integer.
For Example: 10L

  • float

This data type represents a signed 32-bit floating point.
For Example: 10.5F

  • double

“double” represents a 64-bit floating point.
For Example: 10.5

  • chararray

It represents a character array (string) in Unicode UTF-8 format.
For Example: ‘Data Flair’

  • Bytearray

This data type represents a Byte array (blob).

  • Boolean

“Boolean” represents a Boolean value.
For Example : true/ false.
Note: It is case insensitive.

  • Datetime

It represents a date-time.
For Example : 1970-01-01T00:00:00.000+00:00

  • Biginteger

This data type represents a Java BigInteger.
For Example: 60708090709

  • Bigdecimal

“Bigdecimal” represents a Java BigDecimal
For Example: 185.98376256272893883

i.Complex Types

  • Tuple

An ordered set of fields is what we call a tuple.
For Example : (Ankit, 32)

  • Bag

A collection of tuples is what we call a bag.
For Example : {(Ankit,32),(Neha,30)}

  • Map

A set of key-value pairs is what we call a Map.
Example : [ ‘name’#’Ankit’, ‘age’#32]

ii. Null Values

It is possible that values for all the above data types can be NULL. However, SQL and Pig treat null values in the same way.

On defining a null Value, It can be an unknown value or a non-existent value. Moreover, we use it as a placeholder for optional values. Either, These nulls can be the result of an operation or it can occur naturally.

Pig Latin Arithmetic Operators

Here, is the list of arithmetic operators of Pig Latin. Let’s assume,value of A = 20 and B = 40.

  • +

Addition − It simply adds values on either side of the operator.
For Example: 60, it comes to adding A+B.

Subtraction – This operator subtracts right-hand operand from left-hand operand.
For Example: −20, it comes on subtracting A-B

  • *

Multiplication − It simply Multiplies values on either side of the operators.
For Example: 800, it comes to multiplying A*B.

  • /

Division − This operator divides left-hand operand by right-hand operand
For Example: 2,  it comes to dividing, b/a

  • %

Modulus − It Divides left-hand operand by right-hand operand and returns the remainder
For Example: 0, it comes to dividing, b % a.

  • ? :

Bincond − This operator evaluates the Boolean operators. Generally,  it has three operands. Such as:
variable x = (expression) ?, value1 if true or value2 if false.
For Example:

b = (a == 1)? 20: 40;
if a = 1 the value of b is 20.
if a!=1 the value of b is 40.
  • CASE

WHEN
THEN
ELSE END
Case − It is equivalent to the nested bincond operator.
For Example- CASE f2 % 2
WHEN 0 THEN ‘even’
WHEN 1 THEN ‘odd’
END

Comparison Operators in Pig Latin

Here, is the list of the comparison operators of Pig Latin. Let’s assume,value of A = 20 and B = 40.

  • ==

Equal − This operator checks if the values of two operands are equal or not. So, if yes, then the condition becomes true.
For Example- (a = b) is not true

  • !=

Not Equal − It will check if the values of two operands are equal or not. So, if the values are not equal, then condition becomes true.
For Example- (a != b) is true.

  • >

Greater than − This operator checks if the value of the left operand is greater than the value of the right operand. Hence,  if yes, then the condition becomes true.
For Example- (a > b) is not true.

  • <

Less than − It simply checks if the value of the left operand is less than the value of the right operand. So, if yes, then the condition becomes true.
For Example- (a < b) is true.

  • >=

Greater than or equal to − It will check if the value of the left operand is greater than or equal to the value of the right operand. Hence, if yes, then the condition becomes true.
For Example- (a >= b) is not true.

  • <=

Less than or equal to − This operator checks if the value of the left operand is less than or equal to the value of the right operand. So, if yes, then the condition becomes true.
For Example- (a <= b) is true.

  • matches

Pattern matching − It simply checks whether the string in the left-hand side matches with the constant in the right-hand side.
For Example- f1 matches ‘.*dataflair.*’

Type Construction Operators

Here, is the list of the Type construction operators of Pig Latin.

  • ()

Tuple constructor operator − To construct a tuple, we use this operator.
For Example- (Ankit, 32)

  • {}

Bag constructor operator − Moreover, to construct a bag, we use this operator.
For Example- {(Ankit, 32), (Neha, 30)}

  • []

Map constructor operator − In order to construct a tuple, we use this operator.
For Example- [name#Ankit, age#32]

So, this was all in Pig Latin Tutorial. Hope you like our explanation.

Conclusion

As a result, we have seen what is Apache Pig Latin. Also, we discussed the basic Pig Latin statements, data types, general operators with examples. Hence, we hope this article will help you a lot. Still, if any doubt occurs, feel free to ask in the comment section.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *