Execute Pig Script | Apache Pig Running Scripts and Its Comments
Keeping you updated with latest technology trends, Join DataFlair on Telegram
1. Execute Pig Script
In this article on how to execute pig script, we will see the whole concept of Pig Scripts Execution. Also, we will cover the basic comments in Pig Script, that will help while writing a script in a file. Moreover, we will see how to Execute Pig Script in a Batch mode as well as how to Execute a Pig Script from HDFS with proper steps and examples.
2. What is Apache Pig Running Scripts?
Basically, to place Pig Latin statements and Pig commands in a single file, we use Pig scripts. It is good practice to identify the file using the *.pig extension, even while not required.
Moreover, we can run Pig scripts from the command line and from the Grunt shell.
Also, to pass values to parameters using parameter substitution, Pig scripts allows us to do so.
If these professionals can make a switch to Big Data, so can you:
Java → Big Data Consultant, JDA
PeopleSoft → Big Data Architect, Hexaware
3. Comments in Pig Script
We can include comments in Pig Script while writing a script in a file. like:
a. Multi-line comments
The multi-line comments will begin with ‘/*’, end them with ‘*/’.
/* These are the multi-line comments In the pig script */
b. Single –line comments
The single-line comments will begin with ‘–‘.
--we can write single line comments like this.
4. Executing Pig Script in Batch mode
Further, follow these steps, while we execute Pig script in batch mode.
At very first, write all the required Pig Latin statements and commands in a single file. Then save it as a .pig file.
Afterwards, execute the Apache Pig script. To execute Pig script from the shell (Linux), see:
- Local mode
$ pig -x local Sample_script.pig
- MapReduce mode
$ pig -x MapReduce Sample_script.pig
It is possible to execute it from the Grunt shell as well using the exec command.
grunt> exec /sample_script.pig
5. Executing a Pig Script from HDFS
Also, we can execute a Pig script that resides in the HDFS. Let’s assume there is a Pig script with the name Sample_script.pig in the HDFS directory named /pig_data/. To execute it, see.
$ pig -x mapreduce hdfs://localhost:9000/pig_data/Sample_script.pig
- Pig Script Example
Suppose we have a file Employee_details.txt in HDFS with the following content.
Employee_details.txt 001,mehul,chourey,21,9848022337,Hyderabad 002,Ankur,Dutta,22,9848022338,Kolkata 003,Shubham,Sengar,22,9848022339,Delhi 004,Prerna,Tripathi,21,9848022330,Pune 005,Sagar,Joshi,23,9848022336,Bhuwaneshwar 006,Monika,sharma,23,9848022335,Chennai 007,pulkit,pawar,24,9848022334,trivendram 008,Roshan,Shaikh,24,9848022333,Chennai
Now, also we have a sample script with the name sample_script.pig, in the same HDFS directory. It contains statements performing operations and transformations on the Employee relation.
Employee = LOAD 'hdfs://localhost:9000/pig_data/Employee_details.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray); Employee_order = ORDER Employee BY age DESC; Employee_limit = LIMIT Employee_order 4; Dump Employee_limit;
- The script will load the data in the file named Employee_details.txt as a relation named Employee, in the first statement.
- Moreover, the script will arrange the tuples of the relation in descending order, based on age, and store it as Employee_order, in the second statement.
- The script will store the first 4 tuples of Employee_order as Employee_limit, in the third statement.
- Ultimately, last and the 4rth statement will dump the content of the relation Employee_limit.
Further, let’s execute the sample_script.pig.
$./pig -x mapreduce hdfs://localhost:9000/pig_data/sample_script.pig
In this way, Pig gets executed and gives you the output like:
(7,Pulkit,Pawar,24,9848022334,trivendram) (8,Roshan,Shaikh,24,9848022333,Chennai) (5,Sagar,Joshi,23,9848022336,Bhuwaneshwar) (6,Monika,Sharma,23,9848022335,Chennai) 2015-10-19 10:31:27,446 [main] INFO org.apache.pig.Main - Pig script completed in 12 minutes, 32 seconds and 751 milliseconds (752751 ms)
So, this was all about how to Execute Pig Script. Hope you like our explanation.
6. Conclusion – Pig Script
As a result, we have seen the whole concept of Apache Pig Running Scripts, along with Executing a Pig Script in Batch mode and from HDFS. Also, we have seen its comments to understand well. Still, if any doubt occurs, feel free to ask in the comment section.