

{"id":1461,"date":"2017-01-17T06:23:59","date_gmt":"2017-01-17T06:23:59","guid":{"rendered":"http:\/\/data-flair.training\/blogs\/?p=1461"},"modified":"2021-05-09T13:23:37","modified_gmt":"2021-05-09T07:53:37","slug":"flink-hadoop-compatability","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/","title":{"rendered":"Flink Compatibility with Hadoop &#8211; Comprehensive Tutorial"},"content":{"rendered":"<p>This tutorial on Flink compatibility with Hadoop will help you in understanding how Apache Flink is compatible with Big data Hadoop.<\/p>\n<p>It will also help you in learning the basics of Big Data Hadoop and Apache Flink along with the comparison between MapReduce and Flink to help you in getting jobs in Apache Flink with a high paid salary of Flink professionals.<\/p>\n<p>Before starting with Hadoop Flink compatibility, let us\u00a0brush up the <strong>Flink concepts <\/strong>and<strong> Hadoop concepts<\/strong>.<\/p>\n<h2>Hadoop Compatibility with Flink<\/h2>\n<p>Apache Hadoop is widely used for scalable analytical data processing across the industries. Many applications have been implemented in Hadoop <strong>MapReduce <\/strong>that run successfully in clusters.<\/p>\n<p><strong>Big<\/strong> <strong>data is getting matured with Apache Flink<\/strong> and Flink provides an alternative to MapReduce with some improvements in it.<\/p>\n<p>Even if you <strong>optimize Hadoop MapReduce jobs<\/strong>, Flink provides much better performance than Apache Spark and Hadoop and offers APIs in Java and Scala, which are very easy to use. There are many main features of Flink that differentiate <strong>Flink vs Spark vs<\/strong> <strong>Hadoop<\/strong>.<\/p>\n<p>Flink\u2019s APIs provide interfaces for Mapper and Reducer functions along with many operators like InputFormats and OutputFormats.<\/p>\n<p>But do you know:<\/p>\n<p>\u201cThough Hadoop MapReduce and Flink are conceptually equivalent, Hadoop\u2019s MapReduce and Flink\u2019s interfaces for these functions are not source compatible.\u201d<\/p>\n<h2>Flink Hadoop Compatibility Package<\/h2>\n<p>To close the Hadoop Flink compatibility gap, a package was developed as part of a Google Summer of Code 2014 project. This package helps in wrapping functions that are implemented against the MapReduce interface and embed them in <strong>Flink programs<\/strong>.<\/p>\n<p>The Hadoop Compatibility package allows you to reuse below Hadoop APIs in Flink programs without making any change in code:<\/p>\n<ul>\n<li>InputFormats (mapred and mapreduce APIs) as Flink DataSource<\/li>\n<li>OutputFormats (mapred and mapreduce APIs) as Flink DataSink<\/li>\n<li>Mappers (mapred API) as FlatMap function<\/li>\n<li>Reducers (mapred API) as GroupReduce function<\/li>\n<\/ul>\n<h2>Using Hadoop Data Types<\/h2>\n<p>Flink natively supports all Hadoop data types like Writables and WritableComparable. To use Hadoop data types only, you do not need to include Hadoop compatibility dependency.<\/p>\n<h2>Project Configuration<\/h2>\n<p>Flink support for <strong>Hadoop<\/strong> Mappers and Reducers is done by Flink-Hadoop-compatibility Maven module that is always required when writing Flink jobs. This code resides in the org.apache.flink.hadoopcompatibility package.<\/p>\n<p>To reuse mappers and reducers, you need to add the following dependency to your pom.xml<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">&lt;dependency&gt;\n&lt;groupId&gt;org.apache.flink&lt;\/groupId&gt;\n&lt;artifactId&gt;flink-hadoop-compatibility_2.10&lt;\/artifactId&gt;\n&lt;version&gt;1.1.3&lt;\/version&gt;\n&lt;\/dependency&gt;<\/pre>\n<h2>Using Hadoop InputFormats<\/h2>\n<p>readHadoopFile(for input formats derived from FileInputFormat) or createHadoopInput(For general purpose input formats) of the execution environment we can use to create Hadoop input formats as a data source in Flink.<\/p>\n<p>The resulting DataSet has 2-tuples of key and value retrieved from the Hadoop InputFormat.<br \/>\nLearn how to use Hadoop TextInputFormat below:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();\nDataSet&lt;Tuple2&lt;LongWritable, Text&gt;&gt; input = env.readHadoopFile(new TextInputFormat(), LongWritable.class, Text.class, textPath);<\/pre>\n<h2>Using Hadoop OutputFormats<\/h2>\n<p>For Hadoop OutputFormats, compatibility wrapper is provided by Flink. The class supports those that implements org.apache.hadoop.mapred.OutputFormat or extends org.apache.hadoop.mapreduce.OutputFormat. The OutputFormat wrapper expects its input data to be a DataSet of 2-tuples of key and value that will process by the Hadoop OutputFormat.<\/p>\n<h2>Using Hadoop Mappers and Reducers<\/h2>\n<p>Flink\u2019s FlatMap functions and GroupReduce functions are equivalent to hadoop Mappers and Reducers respectively. You can use Hadoop&#8217;s Mapper and Reduce interfaces of Hadoop\u2019s mapred API as such in Flink.<\/p>\n<p>Flink\u2019s function wrappers that we can use as regular Flink FlatMapFunctions or GroupReduceFunctions are<\/p>\n<ul>\n<li>apache.flink.hadoopcompatibility.mapred.HadoopMapFunction,<\/li>\n<li>apache.flink.hadoopcompatibility.mapred.HadoopReduceFunction, and<\/li>\n<li>apache.flink.hadoopcompatibility.mapred.HadoopReduceCombineFunction.<\/li>\n<\/ul>\n<h2>How to use Hadoop Functions in Flink program<\/h2>\n<p>You can use Hadoop functions at any position within a <strong>Flink program <\/strong>and mix them with native Flink functions.<\/p>\n<p>This means you can implement an arbitrary complex Flink program consisting of multiple Hadoop InputFormats and OutputFormats, Mapper and Reducer functions without assembling a workflow of Hadoop jobs in an external driver method or using a workflow scheduler like\u00a0 Apache Oozie.<\/p>\n<h2>Conclusion &#8211; Hadoop Flink compatibility<\/h2>\n<p>Hence, in this Flink Compatability with Hadoop tutorial, we saw Flink lets us reuse the code that we wrote for Hadoop MapReduce, including all data types, all InputFormats and OutputFormats and Mapper and Reducers of the mapred-API.<\/p>\n<p>Also, we can use Hadoop functions within Flink programs and mix them with all other Flink functions. Moreover, Flink\u2019s pipelined execution allows to arbitrarily assemble Hadoop functions without data exchange via<strong> HDFS<\/strong>.<\/p>\n<p>So, this was all in Hadoop Flink compatability. Still, if you have any query, comment below.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This tutorial on Flink compatibility with Hadoop will help you in understanding how Apache Flink is compatible with Big data Hadoop. It will also help you in learning the basics of Big Data Hadoop&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":42236,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[750,1982,4738,16561,16558,16562,5186,16563,16560,16559,16564],"class_list":["post-1461","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-flink","tag-apache-flink","tag-bigdata-hadoop","tag-flink","tag-flink-compatibility","tag-flink-hadoop-compatability","tag-flink-with-hadoop","tag-hadoop","tag-hadoop-compatibility-with-flink","tag-hadoop-data-types","tag-hadoop-flink-compatability","tag-using-hadoop-data-types"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Flink Compatibility with Hadoop - Comprehensive Tutorial - DataFlair<\/title>\n<meta name=\"description\" content=\"Understand Flink Hadoop compatibility-learn how Flink is compatible with hadoop.Learn Flink features and Hadoop MapReduce features for Big data technologies\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Flink Compatibility with Hadoop - Comprehensive Tutorial - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Understand Flink Hadoop compatibility-learn how Flink is compatible with hadoop.Learn Flink features and Hadoop MapReduce features for Big data technologies\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-01-17T06:23:59+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-05-09T07:53:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/01\/Flink-Compatibility-with-Hadoop-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Flink Compatibility with Hadoop - Comprehensive Tutorial - DataFlair","description":"Understand Flink Hadoop compatibility-learn how Flink is compatible with hadoop.Learn Flink features and Hadoop MapReduce features for Big data technologies","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/","og_locale":"en_US","og_type":"article","og_title":"Flink Compatibility with Hadoop - Comprehensive Tutorial - DataFlair","og_description":"Understand Flink Hadoop compatibility-learn how Flink is compatible with hadoop.Learn Flink features and Hadoop MapReduce features for Big data technologies","og_url":"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2017-01-17T06:23:59+00:00","article_modified_time":"2021-05-09T07:53:37+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/01\/Flink-Compatibility-with-Hadoop-01.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"Flink Compatibility with Hadoop &#8211; Comprehensive Tutorial","datePublished":"2017-01-17T06:23:59+00:00","dateModified":"2021-05-09T07:53:37+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/"},"wordCount":731,"commentCount":0,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/01\/Flink-Compatibility-with-Hadoop-01.jpg","keywords":["apache flink","BigData hadoop","flink","Flink compatibility","Flink Hadoop Compatability","Flink with Hadoop","hadoop","Hadoop Compatibility with Flink","Hadoop data Types","Hadoop Flink Compatability","Using Hadoop Data Types"],"articleSection":["Apache Flink Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/","url":"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/","name":"Flink Compatibility with Hadoop - Comprehensive Tutorial - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/01\/Flink-Compatibility-with-Hadoop-01.jpg","datePublished":"2017-01-17T06:23:59+00:00","dateModified":"2021-05-09T07:53:37+00:00","description":"Understand Flink Hadoop compatibility-learn how Flink is compatible with hadoop.Learn Flink features and Hadoop MapReduce features for Big data technologies","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/01\/Flink-Compatibility-with-Hadoop-01.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/01\/Flink-Compatibility-with-Hadoop-01.jpg","width":1200,"height":628,"caption":"Flink Compatibility with Hadoop - Comprehensive Tutorial"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/flink-hadoop-compatability\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Apache Flink Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/flink\/"},{"@type":"ListItem","position":3,"name":"Flink Compatibility with Hadoop &#8211; Comprehensive Tutorial"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/1461","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=1461"}],"version-history":[{"count":1,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/1461\/revisions"}],"predecessor-version":[{"id":94119,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/1461\/revisions\/94119"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/42236"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=1461"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=1461"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=1461"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}