Apache Pig Grunt Shell Commands

1. Apache Pig Grunt Shell

There are so many shell and utility commands offered by the Apache Pig Grunt Shell. So, in this article “Introduction to Apache Pig Grunt Shell”, we will discuss all shell and utility commands in detail.

Apache Pig Grunt Shell

Apache Pig Grunt Shell

2. Introduction to Apache Pig Grunt Shell

We can run your Pig scripts in the shell after invoking the Grunt shell. Moreover, there are certain useful shell and utility commands offered by the Grunt shell. So, let’s discuss all commands one by one.

Get the most demanding skills of IT Industry - Learn Hadoop

3. Apache Pig Grunt Shell Commands

In order to write Pig Latin scripts, we use the Grunt shell of Apache Pig. By using sh and fs we can invoke any shell commands, before that.

i. sh Command

we can invoke any shell commands from the Grunt shell, using the sh command. But make sure, we cannot execute the commands that are a part of the shell environment (ex − cd), using the sh command.
Syntax
The syntax of the sh command is:

grunt> sh shell command parameters

Example
By using the sh option, we can invoke the ls command of Linux shell from the Grunt shell. Here, it lists out the files in the /pig/bin/ directory.

grunt> sh ls
pig
pig_1444799121955.log
pig.cmd
pig.py

ii. fs Command

Moreover, we can invoke any fs Shell commands from the Grunt shell by using the fs command.
Syntax
The syntax of fs command is:

grunt> sh File System command parameters

Example
By using fs command, we can invoke the ls command of HDFS from the Grunt shell. Here, it lists the files in the HDFS root directory.

grunt> fs –ls

Found 3 items

drwxrwxrwx   - Hadoop supergroup          0 2015-09-08 14:13 Hbase
drwxr-xr-x   - Hadoop supergroup          0 2015-09-09 14:52 seqgen_data
drwxr-xr-x   - Hadoop supergroup          0 2015-09-08 11:30 twitter_data

Similarly, using the fs command we can invoke all the other file system shell commands from the Grunt shell.

Hadoop Quiz

4. Utility Commands

It offers a set of Pig Grunt Shell utility commands. It involves clear, help, history, quiet, and set. Also, there are some commands to control Pig from the Grunt shell, such as exec, kill, and run. Here is the description of the utility commands offered by the Grunt shell.

i. clear Command

In order to clear the screen of the Grunt shell, we use Clear Command.
Syntax
The syntax of the clear command is:

grunt> clear

ii. help Command

The help command gives you a list of Pig commands or Pig properties.
Syntax
By using the help command, we can get a list of Pig commands.

grunt> help

Commands

<pig latin statement>;

See the Pig Latin manual for details:

File system commands:
fs <fs arguments>

Equivalent to Hadoop dfs  command:
http://hadoop.apache.org/common/docs/current/hdfs_shell.html
Diagnostic Commands

describe <alias>[::<alias]

Show the schema for the alias.
Inner aliases can be described as A:: B.

[-script <pigscript>] [-out <path>] [-brief] [-dot|-xml]
[-param <param_name>=<pCram_value>]
[-param_file <file_name>] [<alias>] -

Show the execution plan to compute the alias or for the entire script.

  •  -script: Explain the entire script.
  •  -out: Store the output into directory rather than print to stdout.
  •  -brief: Don’t expand nested plans (presenting a smaller graph for the overview).
  •  -dot: Generate the output in .dot format. Default is text format.
  •  -xml: Generate the output in .xml format. Default is text format.
  •  -param <param_name: See parameter substitution for details.
  •  -param_file <file_name>: See parameter substitution for details.
  •  alias: Alias to explain.
  •  dump <alias>: Compute the alias and writes the results to stdout.

Utility Commands

exec [-param <param_name>=param_value] [-param_file <file_name>] <script> -

Execute the script with access to grunt environment including aliases.

  • -param <param_name – See parameter substitution for details.
  • -param_file <file_name> – See parameter substitution for details.
  •  script – Script to be executed.
run [-param <param_name>=param_value] [-param_file <file_name>] <script> -

Execute the script with access to grunt environment.

  • -param <param_name: See parameter substitution for details.   
  • -param_file <file_name>: See parameter substitution for details.
  •  script: Script to be executed.
  •  sh <shell command>: Invoke a shell command.
  •  kill <job_id>: Kill the hadoop job specified by the hadoop job id.
  •  set <key> <value>: Provide execution parameters to Pig. Keys and values are case sensitive.

The following keys are supported:

  • default_parallel: Script-level reduces parallelism. Basic input size heuristics used by default.
  • debug: Set debug on or off. The default is off.
  • job.name: A single-quoted name for jobs. Default is PigLatin:<script name>  
  • job.priority: Priority for jobs. Values: very_low, low, normal, high, very_high.
  • Default is normal stream.skippath: String that contains the path.
  • This is used by streaming any Hadoop property.
  • help – Display this message.
  • history [-n] – Display the list statements in the cache.
  • -n – Hide line numbers.
  • quit – Quit the grunt shell.

iii. history Command

It is the very useful command, it displays a list of statements executed/used so far since the Grunt sell is invoked.
Syntax
Since opening the Grunt shell, let’s suppose we have executed three statements:

grunt> customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING PigStorage(',');
grunt> orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING PigStorage(',');
grunt> Employee = LOAD 'hdfs://localhost:9000/pig_data/Employee.txt' USING PigStorage(',');

Then, using the history command will produce the following output.

grunt> history
customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING PigStorage(',');
orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING PigStorage(',');
Employee = LOAD 'hdfs://localhost:9000/pig_data/Employee.txt' USING PigStorage(',');

iv. set Command

Basically, to show/assign values to keys, we use set command in Pig.
Syntax

There are several keys we can set values for, using this command. Such as:

default_parallel

By passing any whole number as a value to this key, we can set the number of reducers for a map job.

  • debug

Also, by passing on/off to this key, we can turn off or turn on the debugging feature in Pig.

  • job.name

Moreover, by passing a string value to this key we can set the Job name to the required job.

  • job.priority

By passing one of the following values to this key, we can set the job priority to a job −

  1. very_low
  2. low
  3. normal
  4. high
  5. very_high
  • stream.skippath

By passing the desired path in the form of a string to this key, we can set the path from where the data is not to be transferred, for streaming.

v. quit Command

We can quit from the Grunt shell, Using this command.
Syntax

It Quit from the Grunt shell:

grunt> quit

Now see the following commands. By using them we can control Apache Pig from the Grunt shell.

vi. exec Command

Using the exec command, we can execute Pig scripts from the Grunt shell.
Syntax
The syntax of the utility command exec is:

grunt> exec [–param param_name = param_value] [–param_file file_name] [script]

Example
Let’s suppose there is a file named Employee.txt in the /pig_data/ directory of HDFS. Its content is:

Employee.txt
001,Mehul,Hyderabad
002,Ankur,Kolkata
003,Shubham,Delhi

Now, Suppose we have a script file named sample_script.pig in the /pig_data/ directory of HDFS. Its content is:

Sample_script.pig
Employee = LOAD 'hdfs://localhost:9000/pig_data/Employee.txt' USING PigStorage(',')
  as (id:int,name:chararray,city:chararray);
Dump Employee;

Now, let us execute the above script from the Grunt shell using the exec command as shown below.

grunt> exec /sample_script.pig

Output
The exec command executes the script in the sample_script.pig. As directed in the script, it loads the Employee.txt file into Pig and gives you the result of the Dump operator displaying the following content.
(1,Mehul,Hyderabad)
(2,Ankur,Kolkata)
(3,Shubham,Delhi)

vii. kill Command

By using this command, we can kill a job from the Grunt shell.
Syntax
Given below is the syntax of the kill command.

grunt> kill JobId

Example
Assume there is a running Pig job having id Id_0055. By using the kill command, we can kill it from the Grunt shell.

grunt> kill Id_0055

viii. run Command

By using the run command, we can run a Pig script from the Grunt shell.
Syntax
The syntax of the run command is:

grunt> run [–param param_name = param_value] [–param_file file_name] script

Example
So, let’s suppose there is a file named Employee.txt in the /pig_data/ directory of HDFS. Its content is:

Employee.txt
001,Mehul,Hyderabad
002,Ankur,Kolkata
003,Shubham,Delhi

Afterwards, suppose we have a script file named sample_script.pig in the local filesystem. Its content is:

Sample_script.pig
Employee= LOAD 'hdfs://localhost:9000/pig_data/Employee.txt' USING
  PigStorage(',') as (id:int,name:chararray,city:chararray);

Further, using the run command, let’s run the above script from the Grunt shell.

grunt> run /sample_script.pig

Then, using the Dump operator, we can see the output of the script.

grunt> Dump;
(1,Mehul,Hyderabad)
(2,Ankur,Kolkata)
(3,Shubham,Delhi)

Also, it is very important to note that there is one difference between exec and the run command. That is if we use to run, the statements from the script are available in the command history.

5. Conclusion

Hence, we have seen the whole concept of Apache Pig Grunt Shell, with its commands. Still, if any doubt occurs, feel free to ask in a comment section.
For reference

1 Response

  1. Daryl Heinz says:

    can I set the tez.queue.name using the grunt shell or must I set the property in a pig script and then exec the pig script in the grunt shell? Is there any why to do grunt> SET tez.queue.name ‘Junk’

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.