Apache Pig Built in Functions Cheat Sheet 1


1. Apache Pig Built in Functions

In this article “Apache Pig Built in Functions”, we will discuss all the Apache Pig Built-in Functions in detail. It includes eval, load/store, math, bag and tuple functions and many more. Also, we will see their syntax along with their functions and descriptions to understand them well.

Apache Pig Built in Functions

Apache Pig Built in Functions

2. Introduction to Pig Functions

There is a huge set of Apache Pig Built in Functions available. Such as the eval, load/store, math, string, date and time, bag and tuple functions. Basically, there are two main properties which differentiate built in functions from user-defined functions (UDFs) such as:

  • We do not need to register built in functions since Pig knows where they are.
  • Also, we do not need to qualify built in functions, while using them, because again Pig knows where to find them.

3. List of Apache Pig Built in Functions

Let’s discuss various Apache Pig Built in Functions namely eval, load, store, math, string, bag, and tuple, one by one in depth.

i. Eval Functions

Here are the Pig Eval functions, offered by Apache Pig.

a. AVG()

  • Syntax

AVG(expression)

We use AVG(), to compute the average of the numerical values within a bag.

b. BagToString()

This function is used to concatenate the elements of a bag into a string. We can place a delimiter between these values (optional) while concatenating.

c. CONCAT()

  • Syntax

CONCAT (expression, expression)

We use this Pig Function to concatenate two or more expressions of the same type.

d. COUNT()

  • Syntax

COUNT(expression)

While counting the number of tuples in a bag, we use it to get the number of elements in a bag.

e. COUNT_STAR()

  • Syntax

COUNT_STAR(expression)

We can say it is similar to the COUNT() function. To get the number of elements in a bag, we use it.

f. DIFF()

  • Syntax

DIFF (expression, expression)

In order to compare two bags (fields) in a tuple.

g. IsEmpty()

  • Syntax

IsEmpty(expression)

We use this Apache Pig function to check if a bag or map is empty.

h. MAX()

  • Syntax

MAX(expression)

Basically, to calculate the highest value for a column (numeric values or chararrays) in a single-column bag.

i. MIN()

  • Syntax

MIN(expression)

In order to get the minimum (lowest) value (numeric or chararray) for a certain column in a single-column bag.

j. PluckTuple()

We can define a string prefix and filter the columns in a relation that begin with the given prefix, using the Pig Latin PluckTuple() function.

k. SIZE()

  • Syntax

SIZE(expression)

We use this Pig Function in order to compute the number of elements based on any Pig data type.

l. SUBTRACT()

Basically, to subtract two bags. As a process, it takes two bags as inputs. Then returns a bag which contains the tuples of the first bag that are not in the second bag.

m. SUM()

  • Syntax

SUM(expression)

This Function in Pig is to get the total of the numeric values of a column in a single-column bag.

n. TOKENIZE()

  • Syntax

TOKENIZE(expression)

For splitting a string (which contains a group of words) in a single tuple. Then return a bag which contains the output of the split operation.

ii. Load and store Functions

To determine, how data goes into Pig and comes out of Pig we use Load/store functions. Also, we can write your own load/store functions. There is a set of built-in load/store functions such as:

a. PigStorage()

  • Syntax

PigStorage(field_delimiter)

In order to load and store structured files.

b. TextLoader()

  • Syntax

TextLoader()

This Pig Function is used for loading unstructured data into Pig.

c. BinStorage()

  • Syntax

BinStorage()

By using machine-readable format, for loading and storing data into Pig.

d. Handling Compression

We can load and store compressed data in Pig Latin.

iii. Bag and Tuple Functions

Here is the list of Bag and Tuple functions. Such as:

  1. TOBAG()
  • Syntax

TOBAG(expression [, expression …])

This Pig Built in function is used in order to convert two or more expressions into a bag.

  1. TOP()
  • Syntax

TOP(topN,column,relation)

For getting the top N tuples of a relation.

  1. TOTUPLE()
  • Syntax

TOTUPLE(expression [, expression …])

In order to convert one or more expressions into a tuple.

  1. TOMAP()

For getting to convert the key-value pairs into a Map.

iv. String Functions

Here, is the list of String functions in Apache Pig. Such as:

a. ENDSWITH(string, testAgainst)

For verifying whether a given string ends with a particular substring.

b. STARTSWITH(string, substring)

This Pig Function verifies whether the first string starts with the second, after accepting two string parameters.

c. SUBSTRING(string, startIndex, stopIndex)

  • Syntax

SUBSTRING(string, startIndex, stopIndex)

It returns a substring from a given string.

d. EqualsIgnoreCase(string1, string2)

In order to compare two strings ignoring the case.

e. INDEXOF(string, ‘character’, startIndex)

  • Syntax

INDEXOF(string, ‘character’, startIndex)

It returns the first occurrence of a character in a string, searching forward from a start index.

f. LAST_INDEX_OF(expression)

  • Syntax

LAST_INDEX_OF(expression)

To return the index of the last occurrence of a character in a string, searching backward from a start index.

g. LCFIRST(expression)

  • Syntax

LCFIRST(expression)

This Pig Function is used for conversion of the first character in a string to lowercase.

i. UCFIRST(expression)

  • Syntax

UCFIRST(expression)

It returns a string with the first character converted to uppercase.

j. UPPER(expression)

  • Syntax

UPPER(expression)

In order to get a string converted to upper case.

k. LOWER(expression)

  • Syntax

LOWER(expression)

This function in Pig converts all characters in a string to lower case.

l. REPLACE(string, ‘oldChar’, ‘newChar’);

  • Syntax

REPLACE(string, ‘oldChar’, ‘newChar’);

For replacing existing characters in a string with new characters.

m. STRSPLIT(string, regex, limit)

  • Syntax

STRSPLIT(string, regex, limit)

In order to split a string around matches of a given regular expression.

n. STRSPLITTOBAG(string, regex, limit)

It splits the string by given delimiter and returns the result in a bag.

o. TRIM(expression)

  • Syntax

TRIM(expression)

This Pig Built in Function is used to return a copy of a string with leading and trailing whitespaces removed.

p. LTRIM(expression)

It returns a copy of a string with leading whitespaces removed.

q. RTRIM(expression)

For returning a copy of a string with trailing whitespaces removed.

v. Date and Time Functions

Here is the list of Date and Time functions.

a. ToDate(milliseconds)

According to the given parameters, it returns a date-time object. There are more alternative for this functions. Such as ToDate(iosstring), ToDate(userstring, format), ToDate(userstring, format, timezone)

b. CurrentTime()

It returns the date-time object of the current time.

c. GetDay(datetime)

To get the day of a month as a return from the date-time object, we use it.

d. GetHour(datetime)

GetHour returns the hour of a day from the date-time object.

c. GetMilliSecond(datetime)

It returns the millisecond of a second from the date-time object.

d. GetMinute(datetime)

To get the minute of an hour in return from the date-time object, we use it.

e. GetMonth(datetime)

GetMonth returns the month of a year from the date-time object.

f. GetSecond(datetime)

It returns the second of a minute from the date-time object.

g. GetWeek(datetime)

To get the week of a year as a return from the date-time object, we use it.

h. GetWeekYear(datetime)

GetWeekYear returns the week year from the date-time object.

i. GetYear(datetime)

It returns the year from the date-time object.

j. AddDuration(datetime, duration)

To get the result of a date-time object as a result along with the duration object, we use it.

k. SubtractDuration(datetime, duration)

SubtractDuration subtracts the duration object from the Date-Time object and returns the result.

l. DaysBetween(datetime1, datetime2)

DaysBetween returns the number of days between the two date-time objects.

m. HoursBetween(datetime1, datetime2)

It returns the number of hours between two date-time objects.

n. MilliSecondsBetween(datetime1, datetime2)

To get the number of milliseconds as result between two date-time objects, we use it.

o. MinutesBetween(datetime1, datetime2)

MinutesBetween returns the number of minutes between two date-time objects.

p. MonthsBetween(datetime1, datetime2)

To get the number of months as a return between two date-time objects, we use it.

q. SecondsBetween(datetime1, datetime2)

It returns the number of seconds between two date-time objects.

r. WeeksBetween(datetime1, datetime2)

WeeksBetween returns the number of weeks between two date-time objects.

s. YearsBetween(datetime1, datetime2)

To get the number of years as a return between two date-time objects, we use it.

Any doubt yet in Pig Built in functions? Please Comment.

vi. Math Functions

We have the following Math functions in Apache Pig −

a. ABS(expression)

  • Syntax

ABS(expression)

In order to get the absolute value of an expression.

b. ACOS(expression)

  • Syntax

ACOS(expression)

It gives the arc cosine of an expression.

c. ASIN(expression)

  • Syntax

ASIN(expression)

ASIN gives the arc sine of an expression.

d. ATAN(expression)

  • Syntax

ATAN(expression

To get the arc tangent of an expression, we use it.

e. CBRT(expression)

  • Syntax

CBRT(expression)

It gives the cube root of an expression.

f. CEIL(expression)

  • Syntax

CEIL(expression)

CEIL is used to get the value of an expression rounded up to the nearest integer.

g. COS(expression)

  • Syntax

COS(expression)

In order to get the trigonometric cosine of an expression.

h. COSH(expression)

  • Syntax

COSH(expression)

COSH gives the hyperbolic cosine of an expression.

i. EXP(expression)

  • Syntax

EXP(expression)

To get the Euler’s number e raised to the power of x.

j. FLOOR(expression)

  • Syntax

FLOOR(expression)

In order to get the value of an expression rounded down to the nearest integer.

h. LOG(expression)

  • Syntax

LOG(expression)

LOG gives the natural logarithm (base e) of an expression.

i. LOG10(expression)

  • Syntax

LOG10(expression)

It gives the base 10 logarithms of an expression.

j. RANDOM( )

  • Syntax

RANDOM( )

In order to get a pseudo random number (type double) greater than or equal to 0.0 and less than 1.0.

k. ROUND(expression)

  • Syntax

ROUND(expression)

ROUND gives the value of an expression rounded to an integer (if the result type is float) or rounded to a long (if the result type is double).

l. SIN(expression)

  • Syntax

SIN(expression)

In order to get the sine of an expression.

m. SINH(expression)

  • Syntax

SINH(expression)

It gives the hyperbolic sine of an expression.

n. SQRT(expression)

  • Syntax

SQRT(expression)

SQRT gives the positive square root of an expression.

o. TAN(expression)

  • Syntax

TAN(expression)

In order to get the trigonometric tangent of an angle.

p. TANH(expression)

  • Syntax

TANH(expression)

It gives the hyperbolic tangent of an expression.

This was all on Pig Built in Functions

4. Conclusion: Apache Pig Built in Functions

As a result, we have seen all the Apache Pig Built in Functions in detail. Still, if any doubt occurs, feel free to ask in the comments.

For Reference


Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “Apache Pig Built in Functions Cheat Sheet