Execute Pig Script | Apache Pig Running Scripts and Its Comments

Boost your career with Free Big Data Courses!!

In this article on how to execute pig script, we will see the whole concept of Pig Scripts Execution. Also, we will cover the basic comments in Pig Script, that will help while writing a script in a file.

Moreover, we will see how to Execute Pig Script in a Batch mode as well as how to Execute a Pig Script from HDFS with proper steps and examples.

Let’s have a look at Pig Latin Operators and Statements

What is Apache Pig Running Scripts?

Basically, to place Pig Latin statements and Pig commands in a single file, we use Pig scripts. It is good practice to identify the file using the *.pig extension, even while not required.

Moreover, we can run Pig scripts from the command line and from the Grunt shell.

Also, to pass values to parameters using parameter substitution, Pig scripts allows us to do so.

Comments in Pig Script

We can include comments in Pig Script while writing a script in a file. like:

a. Multi-line comments
The multi-line comments will begin with ‘/*’, end them with ‘*/’.

/* These are the multi-line comments
 In the pig script */

b. Single –line comments
The single-line comments will begin with ‘–‘.

--we can write single line comments like this.

Executing Pig Script in Batch mode

Further, follow these steps, while we execute Pig script in batch mode.
Step 1
At very first, write all the required Pig Latin statements and commands in a single file. Then save it as a .pig file.
Step 2
Afterwards, execute the Apache Pig script. To execute Pig script from the shell (Linux), see:

  • Local mode
$ pig -x local Sample_script.pig
  • MapReduce mode
$ pig -x MapReduce Sample_script.pig

It is possible to execute it from the Grunt shell as well using the exec command.

grunt> exec /sample_script.pig

Read about Apache Pig Architecture and Execution Modes

Executing a Pig Script from HDFS

Also, we can execute a Pig script that resides in the HDFS. Let’s assume there is a Pig script with the name Sample_script.pig in the HDFS directory named /pig_data/. To execute it, see.

$ pig -x mapreduce hdfs://localhost:9000/pig_data/Sample_script.pig
  • Pig Script Example

Suppose we have a file Employee_details.txt in HDFS with the following content.

Employee_details.txt
001,mehul,chourey,21,9848022337,Hyderabad
002,Ankur,Dutta,22,9848022338,Kolkata
003,Shubham,Sengar,22,9848022339,Delhi
004,Prerna,Tripathi,21,9848022330,Pune
005,Sagar,Joshi,23,9848022336,Bhuwaneshwar
006,Monika,sharma,23,9848022335,Chennai
007,pulkit,pawar,24,9848022334,trivendram
008,Roshan,Shaikh,24,9848022333,Chennai

Now, also we have a sample script with the name sample_script.pig, in the same HDFS directory. It contains statements performing operations and transformations on the Employee relation.

Employee = LOAD 'hdfs://localhost:9000/pig_data/Employee_details.txt' USING PigStorage(',')
  as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);
Employee_order = ORDER Employee BY age DESC;
Employee_limit = LIMIT Employee_order 4;
Dump Employee_limit;
  • The script will load the data in the file named Employee_details.txt as a relation named Employee, in the first statement.
  • Moreover, the script will arrange the tuples of the relation in descending order, based on age, and store it as Employee_order, in the second statement.
  • The script will store the first 4 tuples of Employee_order as Employee_limit, in the third statement.
  • Ultimately,  last and the 4rth statement will dump the content of the relation Employee_limit.

Further, let’s execute the sample_script.pig.

$./pig -x mapreduce hdfs://localhost:9000/pig_data/sample_script.pig

In this way, Pig gets executed and gives you the output like:

(7,Pulkit,Pawar,24,9848022334,trivendram)
(8,Roshan,Shaikh,24,9848022333,Chennai)
(5,Sagar,Joshi,23,9848022336,Bhuwaneshwar)
(6,Monika,Sharma,23,9848022335,Chennai)
2015-10-19 10:31:27,446 [main] INFO  org.apache.pig.Main - Pig script completed in 12
minutes, 32 seconds and 751 milliseconds (752751 ms)

So, this was all about how to Execute Pig Script. Hope you like our explanation.

Conclusion – Pig Script

As a result, we have seen the whole concept of Apache Pig Running Scripts, along with Executing a Pig Script in Batch mode and from HDFS. Also, we have seen its comments to understand well. Still, if any doubt occurs, feel free to ask in the comment section.

You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *