Best 30 Apache Pig Interview Questions and Answers

After top 30 Pig Interview Questions and Answers Part – I. Here, we are presenting another Best 30 Apache Pig Interview Questions and Answers. So, through this article, we can prepare this tool available in the best way. Also, we can say it covers best of all Pig Interview Questions.

Hence, let’s start Pig Interview Questions for freshers & for experience/professionals.

Frequently Asked Pig Interview Questions and Answers

There are various Real-time Pig Interview Questions. So, let’s discuss best Apache Pig Interview Questions along with their answers:

Que 1. What is Apache Pig?
Answer: It is an Apache open source project which is run on Hadoop. Pig also provides the engine for data flow in parallel on Hadoop. Also, includes language called Pig Latin, that is for expressing these data flow. Moreover, there are several operations like joins, sort, filter and many more.

Furthermore, Pig also offers an ability to write User Defined Functions(UDF) for processing and reading and writing purposes.

Que 2. What are the different Relational Operators available in Pig language?
Answer: They are of following types:

  • Loading and Storing
  • Filtering
  • Grouping and joining
  • Sorting
  • Combining and Splitting
  • Diagnostic

Que 3. What are the different modes available in Pig?
Answer: There are two modes available in the Pig:

  • Local Mode (Runs on localhost file system)
  • MapReduce Mode (Runs on Hadoop Cluster)

Que 4. Can we say a COGROUP is a group of more than 1 data set?
Answer: Basically, it is a group of one data set. However, cogroup will group all the data sets and join them based on the common field in the case of more than one datasets. In addition, cogroup is a group of more than one data set and join of that data set as well.

Que 5. What does FOREACH do?
Answer: To apply transformations to the data and to generate new data items, we use FOREACH. According to its name, for each element of a data bag, the respective action will be performed.

Syntax: FOREACH bagname GENERATE expression1, expression2, …..
After GENERATE, all the expressions which are mentioned will apply to the current record of the data bag.

Que 6. Why do we use ‘filters’ Pig scripts?
Answer: State advantages of Apache Pig?
Answer: Advantages of Apache Pig are:

  • Less Development time

It consumes less time while development which is one of the major advantages.

  • Easy to learn

However, we can say, Apache Pig’s Learn curve is not steep. So, if someone doesn’t know to write vanilla MapReduce or SQL they can write MapReduce jobs.

  • Procedural language

It means Pig is a Procedural language. It is not declarative, unlike SQL. Hence, we can easily follow the commands. Also, in the transformation of data, it offers better expressiveness in every step.

Moreover, on comparing it to vanilla MapReduce, it is much more like the English language which is very concise and unlike Java but more like Python.

  • Dataflow

It is a data flow language. That means where everything is about data even though we sacrifice control structures like for loop or if structures.

Data transformation is a first class citizen, by this data and because of data. Also, we cannot create for loops, without data. we need to always transform and manipulate data.

  • Easy to control Execution

We can control the execution of every step because it is procedural in nature. Also, a benefit that it is, straightforward. Hence, it is possible to write our own UDF(User Defined Function) and inject in one specific part in the pipeline.

  • UDFs

It is possible to write our own UDFs.  

Que 7. What is the bag?
Answer: It is one of the data models present in Pig. Moreover, it is an unordered collection of tuples with possible duplicates. Basically, to store collections while grouping, we use bags. We represent bags with “{}”.

Que 8. Why should we use ‘orderby’ keyword in pig scripts?
Answer: This statement sorts our data for us, producing a total order of our output data. In addition, the syntax of order is similar to the group. We indicate a key or set of keys by which we wish to order our data

For Example;
input2 = load ‘daily’ as (exchanges, stocks);
grpds = order input2 by exchanges;

Que 9. What is Pig Latin?
Answer: It analyzes the data in Hadoop using Apache Pig. Here an interpreter layer transforms Pig Latin statements into MapReduce jobs. Then Hadoop process these jobs further. Also, we can say it is a very simple language with SQL like semantics.

Que 10. What are the relational operators available related to Grouping and joining in Pig language?
Answer: The most powerful operators in Pig language are Grouping and Joining operators. Because in low-level MapReduce language, core MapReduce creation for grouping and joins are very typical.

  • JOIN

To join two or more relations.

  • GROUP

For aggregation of a single relation

  • COGROUP

For the aggregation of multiple relations

  • CROSS

In order to create a cartesian product of two or more relations, we use it.
Basic Pig Interview Questions. Q- 1,2,3,4,5,6,7,9
Advanced Pig Interview Questions. Q- 8,10 

Que 11. What are the different String functions available in Pig?
Answer: STRING Pig functions are:

  • UPPER
  • LOWER
  • TRIM
  • SUBSTRING
  • INDEXOF
  • STRSPLIT
  • LAST_INDEX_OF

Que 12. What is a relation in Pig?
Answer: A bag of tuples, is what we call a Pig relation. It is as same as a table in a relational database, where the tuples in the bag correspond to the rows in a table.

The only difference is Pig relations don’t require that every tuple contain the same number of fields or that the fields in the same position (column) have the same type.

Que 13. What is a tuple?
Answer: An ordered set of fields is what we call a tuple. Whereas, a field is a piece of data.

Que 14. What is the MapReduce plan in pig architecture?
Answer: Basically, the output of Physical plan is converted into an actual MapReduce program which then executes across the Hadoop Cluster.

Que 15. What relational operators can we use that are related to combining and splitting in Pig language?
Answer: We use UNION and SPLIT operators to combine and split relations in the Pig,

Que 16. What is UDF in Pig?
Answer: Apache Pig offers extensive support for User Defined Functions (UDF’s), in addition to the built-in functions. Also, it is possible to define our own functions and use them, using these UDF’s.

Moreover, in six programming languages, UDF support is available. Such as Java, Jython, Python, JavaScript, Ruby, and Groovy.

However, we can say, complete support is only provided in Java. While in all the remaining languages limited support is provided. Also, we can write UDF’s which will include all the parts of the processing like column transformation, data load/store, and aggregation, using Java.

Since Apache Pig has been written in Java, make sure the UDF’s written using Java language work efficiently compared to other languages.

Also, we have a Java repository for UDF’s named Piggybank, in Apache Pig. Basically, we can access Java UDF’s written by other users, and contribute our own UDF’s, using Piggybank.

Que 17. What are the primitive data types in Pig?
Answer: The primitive data types in Pig are:

  • Int
  • Long
  • Float
  • Double
  • Char array
  • Byte array

Que 18. What is bag data type in Pig?
Answer: Basically, it works as a container for bags and tuple. Also, we can say it is very complex data type in Pig Latin language.

Que 19. Why should we use ‘distinct’ keyword in Pig scripts?
Answer: It is a very simple statement which removes duplicate records. Moreover, it works only on entire records, not on individual fields:

For Example;
input2 = load ‘daily’ as (exchanges, stocks);
grpds = distinct exchanges;

Que 20. What are the different math functions available in Pig?
Answer: Below are most commonly used math Pig functions

  • ABS
  • ACOS
  • EXP
  • LOG
  • ROUND
  • CBRT
  • RANDOM
  • SQRT

Basic Pig Interview Questions. Q- 11,12,13,16,17, 18,19,20
Advanced Pig Interview Questions. Q- 14,15

Que 21. What are the different Eval functions available in Pig?
Answer: Below are most commonly used Eval Pig functions

  • AVG
  • CONCAT
  • MAX
  • MIN
  • SUM
  • SIZE
  • COUNT
  • COUNT_STAR
  • DIFF
  • TOKENIZE
  • IsEmpty

Que 22. What are the relational operators available related to loading and storing in Pig language?
Answer: Pig uses following operators to Load data and Store it into HDFS, 

  • LOAD
  • STORE

LOAD operator, loads the data from the file system, whereas STORE stores the data in the file system.

Que 23. What are different modes of execution in Apache Pig?
Answer: “Hadoop MapReduce (Java) Command Mode” and “Pig (Local Mode) Command Mode”. The first mode,  Local Mode needs access to only a single machine where all files are installed and executed on a local host whereas second mode MapReduce requires accessing the Hadoop cluster.

Que 24. Does Pig support multi-line commands?
Answer: Yes

Que 25. How would you diagnose or do exception handling in the Pig?
Answer. We can use following operators, for exception handling of Pig script.

  • DUMP- “DUMP” operator displays the results on screen.
  • DESCRIBE- “DESCRIBE” operator displays the schema of a particular relation.
  • ILLUSTRATE- “ILLUSTRATE” operator displays step by step execution of a sequence of Pig statements.
  • EXPLAIN- “EXPLAIN” operator displays the execution plan for pig latin statements.

Que 26. What are Pig Execution modes?
Answer: There are two execution modes of Apache Pig. Although, it also depends upon where the Pig script is going to run. Also on where the data is residing. Then we can store data on a single machine or in a distributed environment like Clusters. To run Pig programs, 3 different modes are:

  • Non-interactive shell or script mode, here the user has to create a file, load the code and execute the script.
  • For running Apache Pig commands Grunt shell or interactive shell.
  • Embedded mode, JDBC to run SQL programs from Java.

Que 27. What are the different ways of executing Pig script?
Answer: To execute the Pig script, we have three ways:

  • Grunt Shell: In order to execute all Pig Scripts, this is Pig’s interactive shell.
  • Script File: Write all the Pig commands in a script file and execute the Pig script file. This is executed by the Pig Server.
  • Embedded Script: If some functions are unavailable in built-in operators, it is possible to create User Defined Functions (UDF) to bring that functionality using other languages such as Java, Python, Ruby, etc. 

Que 28 What do you understand by an inner bag and outer bag in Pig?
Answer: Basically, Outer bag or relation is a bag of tuples. Moreover, relations are as same as relations in relational databases.

For example:
{(Taj Mahal, Agra), (India Gate, Delhi), (Qutub Minar, Delhi)}
In addition, an inner bag contains a bag inside a tuple. For Example:
(Delhi, {(India Gate, Delhi), (Qutub Minar, Delhi)})
(Agra, {(Taj Mahal, Agra)})

Que 29. State the limitations of Apache Pig.
Answer: Limitations are:

  • Delay in Execution

The commands are not executed unless either we dump or store an intermediate or final result. Basically, this increases the iteration between debug and resolve the issue.

  • Minor one

Here, is an absence of good ide or plugin for Vim. That offers more functionality than syntax completion to write the pig scripts.

  • Errors of Pig

Due to UDFS(Python), errors that Pig produces are not helpful at all. Because if anything goes wrong, it just gives the error like exec error in UDF even if the problem is related to syntax or the type error, let alone a logical one. It is huge.

  • Not mature

Pig is still in the development, even if it has been around for quite some time. So, we can say it is not enough mature.

  • Support

Generally, Google and StackOverflow do not lead good solutions for the problems.

Que 30. What are the Applications of Apache Pig?
More of its applications are:

  • In order to process huge data sources. For example, weblogs.
  • To perform tasks involving ad-hoc processing and quick prototyping.
  • Also, to perform data processing for search platforms.
  • Moreover, to process time sensitive data loads.

Basic Pig Interview Questions. Q- 21,22,24,26,27,28,30
Advanced Pig Interview Questions. Q- 23,25,29
So, this was all about Apache Pig Interview Questions Tutorial. Hope you like our explanation.

Conclusion: Apache Pig Interview Questions

As a result, we have seen all the best possible Pig Interview Questions. Also, we have provided relevant links to learn several topics in detail. Still, if any doubt occurs regarding Pig interview questions, feel free to ask in the comments tab.

You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *