{"id":4654,"date":"2017-11-03T08:44:33","date_gmt":"2017-11-03T08:44:33","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=4654"},"modified":"2017-11-03T08:44:33","modified_gmt":"2017-11-03T08:44:33","slug":"apache-pig-architecture","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/","title":{"rendered":"Apache Pig Architecture and Execution Modes"},"content":{"rendered":"<p><span style=\"font-weight: 400\">In this article, we will cover the\u00a0<strong>Apache Pig\u00a0Architecture<\/strong>. It is actually developed on top of <strong>Hadoop<\/strong>. Moreover, we will see the various components of Apache Hive and Pig Latin Data Model. The Apache Pig provides a high-level language. We will also see the two modes to run this component.<\/span><\/p>\n<p>So, let&#8217;s start Apache pig Architecture.<\/p>\n<h2>What is Apache Pig Architecture?<\/h2>\n<p><span style=\"font-weight: 400\">The language which analyzes data in Hadoop using Pig called as Pig Latin. Therefore, it is a high-level data processing language. While it provides a wide range of data types and operators to perform data operations.<\/span><\/p>\n<p><span style=\"font-weight: 400\">To perform a task using Pig, programmers need to write a Pig script using the Pig Latin language. They execute them with any of the execution mechanisms such as (Grunt Shell, UDFs, Embedded). <\/span><\/p>\n<p><span style=\"font-weight: 400\">These scripts will also go through a series of transformations after execution. Moreover, the Pig Framework produces the desired output.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Apache Pig converts these scripts into many MapReduce jobs. Thus, it makes the job easy for developers.<\/span><\/p>\n<h2>Components of Apache Pig<\/h2>\n<p><span style=\"font-weight: 400\">There\u00a0are various components in Apache Pig Architecture which makes its execution faster as discussed below:<\/span><\/p>\n<div id=\"attachment_4657\" style=\"width: 812px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/11\/components-of-apache-pig.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-4657\" class=\"wp-image-4657 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/11\/components-of-apache-pig.jpg\" alt=\"Components of Apache Pig\" width=\"802\" height=\"420\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/11\/components-of-apache-pig.jpg 802w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/11\/components-of-apache-pig-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/11\/components-of-apache-pig-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/11\/components-of-apache-pig-768x402.jpg 768w\" sizes=\"auto, (max-width: 802px) 100vw, 802px\" \/><\/a><p id=\"caption-attachment-4657\" class=\"wp-caption-text\">Components of Apache Pig<\/p><\/div>\n<h3>a. Parser<\/h3>\n<p><span style=\"font-weight: 400\">The Parser handles the Pig Scripts and checks the syntax of the script. It includes type checking with other checks. Therefore, an output of the parser will be a Directed Graph. However, it represents the Pig Latin statements and logical operators.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In the <strong>DAG<\/strong>, the script operators are actually represented as the nodes. Moreover, the data flows are eventually represented as edges.<\/span><\/p>\n<h3>b. Optimizer<\/h3>\n<p><span style=\"font-weight: 400\">The logical optimizer then receives the logical plan (DAG). In fact, it carries out the logical optimization such as projection and push down.<\/span><\/p>\n<h3>c. Compiler<\/h3>\n<p><span style=\"font-weight: 400\">The compiler converts the logical plan into a series of <strong>MapReduce jobs<\/strong>.<\/span><\/p>\n<h3>d. Execution Engine<\/h3>\n<p><span style=\"font-weight: 400\">In the end, the MapReduce jobs get submitted to Hadoop in a sorted order. Therefore these MapReduce jobs execute on the Hadoop and produce the desired results.<\/span><\/p>\n<h2>Pig Latin Data Model<\/h2>\n<p><span style=\"font-weight: 400\">There is a complete nested data model of Pig Latin. Meanwhile, it allows complex non-atomic data types such as map and tuple. <\/span><\/p>\n<h3>a. Field and Atom<\/h3>\n<p><span style=\"font-weight: 400\">Atom is a single value in Pig Latin, with any data type. The storage occurs in form of string and we can also use it as string and number. Various atomic values of Pig are int, long, float, double, chararray, and byte array. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Furthermore, any simple atomic value or data is actually considered as a field.<\/span><br \/>\n<span style=\"font-weight: 400\"><strong>For Example<\/strong> \u2212 \u2018dataflair\u2019 or \u201812\u2019<\/span><\/p>\n<h3>d. Tuples<\/h3>\n<p><span style=\"font-weight: 400\">A record which contains an ordered set of fields is a Tuple. Thus, the fields can be of any type. A tuple is same as the row in a table of RDBMS.<\/span><br \/>\n<span style=\"font-weight: 400\"><strong>For Example<\/strong> \u2212 (Dataflair, 12)<\/span><\/p>\n<h3>c. Bag<\/h3>\n<p><span style=\"font-weight: 400\">A bag contains an unordered set of tuples. Therefore, a collection of tuples (non-unique) is can be a bag. Each tuple may have any number of fields. We can represent the bag as \u2018{}\u2019. It is same as a table in RDBMS. <\/span><\/p>\n<p><span style=\"font-weight: 400\">However, it is not necessary that every tuple contains the same fields. Hence,\u00a0 the fields in the same position (column) may not have the same type.<\/span><\/p>\n<p><span style=\"font-weight: 400\"><strong>Example<\/strong> \u2212 {(Dataflair, 12), (Training, 11)}<\/span><br \/>\n<span style=\"font-weight: 400\">While a bag can be a field in a relation which is an inner bag.<\/span><br \/>\n<span style=\"font-weight: 400\"><strong>Example<\/strong> \u2212 {Dataflair, 12, {1212121212, dt@gmail.com,}}<\/span><\/p>\n<h3>d. Map<\/h3>\n<p><span style=\"font-weight: 400\">A map (or data map) contains the set of many<strong> key-value pairs<\/strong>. Meanwhile, the key has to be of type chararray and unique. The value can be of any type. We can represent it by \u2018[]\u2019.<\/span><br \/>\n<span style=\"font-weight: 400\"><strong>Example<\/strong> \u2212 [name#Dataflair, age#11]<\/span><\/p>\n<h3>e. Relation<\/h3>\n<p><span style=\"font-weight: 400\">Furthermore, a relation contains the bag of tuples. There may be no serial order of processing in the relations.<\/span><\/p>\n<h2>Job Execution Flow<\/h2>\n<p><span style=\"font-weight: 400\">The developer creates the scripts, and then it goes to the local file system as functions. Moreover, when the developers submit Pig Script, it contacts with Pig Latin Compiler. <\/span><\/p>\n<p><span style=\"font-weight: 400\">The compiler then splits the task and run a series of MR jobs. Meanwhile, Pig Compiler retrieves data from the<strong> HDFS<\/strong>. The output file again goes to the HDFS after running MR jobs.<\/span><\/p>\n<h3>a. Pig Execution Modes<\/h3>\n<p><span style=\"font-weight: 400\">We can run Pig in two execution modes. These modes depend upon where the Pig script is going to run. It also depends on where the data is residing. We can thus store data on a single machine or in a distributed environment like Clusters. <\/span><\/p>\n<p><span style=\"font-weight: 400\">The three different modes to run Pig programs are:<\/span><br \/>\n<span style=\"font-weight: 400\">Non-interactive shell or script mode- The user has to create a file, load the code and execute the script. Then comes the Grunt shell or interactive shell for running Apache Pig commands. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Hence, the last one named as embedded mode, which we can use JDBC to run SQL programs from Java.<\/span><\/p>\n<h3>b. Pig Local mode<\/h3>\n<p><span style=\"font-weight: 400\">However, in this mode, pig implements on single JVM and access the file system. This mode is better for dealing with the small data sets. Meanwhile, the parallel <strong>mapper<\/strong> execution is impossible. The older version of the Hadoop is not thread-safe.<\/span><\/p>\n<p><span style=\"font-weight: 400\">While the user can provide \u2013x local to get into Pig local mode of execution. Therefore, Pig always looks for the local file system path while loading data. <\/span><\/p>\n<h3>c. Pig Map Reduce Mode<\/h3>\n<p><span style=\"font-weight: 400\">In this mode, a user could have proper Hadoop cluster setup and installations on it. By default, Apache Pig installs as in MR mode. The Pig also translates the queries into Map reduce jobs and runs on top of Hadoop cluster. Hence, this mode as a Map reduce runs on a distributed cluster.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The statements like LOAD, STORE read the data from the HDFS file system and to show output. These Statements are also used to process data.<\/span><\/p>\n<h3>d. Storing Results<\/h3>\n<p><span style=\"font-weight: 400\"> The intermediate data generates during the processing of MR jobs. Pig stores this data in a non-permanent location on HDFS storage. The temporary location then created inside HDFS for storing this intermediate data.<\/span><\/p>\n<p><span style=\"font-weight: 400\">We can use DUMP for getting the final results to the output screen. The output results stored using STORE operator.<\/span><\/p>\n<p>So, this was all in Apache Pig Architecture. Hope you like our explanation.<\/p>\n<h2>Conclusion &#8211; Apache Pig Architecture<\/h2>\n<p><span style=\"font-weight: 400\">By providing a parallel mechanism and running the jobs across clusters, Pig is popularly used. The high-level scripting language gives developers an interface to get results. Pig also provides the optimization techniques for smooth data flow across a cluster.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Moreover, specific filtering, grouping, and iterations reduce the complexity of the code. They also run in an effective manner.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, we will cover the\u00a0Apache Pig\u00a0Architecture. It is actually developed on top of Hadoop. Moreover, we will see the various components of Apache Hive and Pig Latin Data Model. The Apache Pig&#46;&#46;&#46;<\/p>\n","protected":false},"author":7,"featured_media":35514,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[40],"tags":[863,864,888,16673,1907,9496,9509],"class_list":["post-4654","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-pig","tag-apache-pig","tag-apache-pig-architecture","tag-apache-pig-tutorial","tag-apache-pig-working","tag-big-data","tag-pig-architecture","tag-pig-introduction"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Apache Pig Architecture and Execution Modes - DataFlair<\/title>\n<meta name=\"description\" content=\"Introduction to Apache pig Architecture,Pig Components Parser, optimizer, compiler, Execution Engine,Pig Latin Data Model,Job Execution Flow, Pig Local mode\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Pig Architecture and Execution Modes - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Introduction to Apache pig Architecture,Pig Components Parser, optimizer, compiler, Execution Engine,Pig Latin Data Model,Job Execution Flow, Pig Local mode\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-11-03T08:44:33+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/11\/apache-pig-architecture-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"802\" \/>\n\t<meta property=\"og:image:height\" content=\"420\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache Pig Architecture and Execution Modes - DataFlair","description":"Introduction to Apache pig Architecture,Pig Components Parser, optimizer, compiler, Execution Engine,Pig Latin Data Model,Job Execution Flow, Pig Local mode","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/","og_locale":"en_US","og_type":"article","og_title":"Apache Pig Architecture and Execution Modes - DataFlair","og_description":"Introduction to Apache pig Architecture,Pig Components Parser, optimizer, compiler, Execution Engine,Pig Latin Data Model,Job Execution Flow, Pig Local mode","og_url":"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2017-11-03T08:44:33+00:00","og_image":[{"width":802,"height":420,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/11\/apache-pig-architecture-1.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd"},"headline":"Apache Pig Architecture and Execution Modes","datePublished":"2017-11-03T08:44:33+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/"},"wordCount":1082,"commentCount":0,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/11\/apache-pig-architecture-1.jpg","keywords":["apache pig","Apache pig Architecture","Apache Pig tutorial","Apache Pig Working","big data","Pig Architecture","pig introduction"],"articleSection":["Pig Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/","url":"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/","name":"Apache Pig Architecture and Execution Modes - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/11\/apache-pig-architecture-1.jpg","datePublished":"2017-11-03T08:44:33+00:00","description":"Introduction to Apache pig Architecture,Pig Components Parser, optimizer, compiler, Execution Engine,Pig Latin Data Model,Job Execution Flow, Pig Local mode","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/11\/apache-pig-architecture-1.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/11\/apache-pig-architecture-1.jpg","width":802,"height":420,"caption":"Apache Pig Architecture and Execution Modes"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/apache-pig-architecture\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Pig Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/pig\/"},{"@type":"ListItem","position":3,"name":"Apache Pig Architecture and Execution Modes"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"DataFlair Team specializes in creating clear, actionable content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Backed by industry expertise, we make learning easy and career-oriented for beginners and pros alike.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam3\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/4654","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=4654"}],"version-history":[{"count":0,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/4654\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/35514"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=4654"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=4654"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=4654"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}