

{"id":5918,"date":"2018-01-17T09:06:41","date_gmt":"2018-01-17T09:06:41","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=5918"},"modified":"2018-11-16T13:51:04","modified_gmt":"2018-11-16T08:21:04","slug":"spark-paired-rdd","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/","title":{"rendered":"Introduction to Apache Spark Paired RDD"},"content":{"rendered":"<div class='__iawmlf-post-loop-links' style='display:none;' data-iawmlf-post-links='[{&quot;id&quot;:2052,&quot;href&quot;:&quot;https:\\\/\\\/spark.apache.org&quot;,&quot;archived_href&quot;:&quot;http:\\\/\\\/web-wp.archive.org\\\/web\\\/20251009215151\\\/https:\\\/\\\/spark.apache.org\\\/&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[{&quot;date&quot;:&quot;2025-12-11 00:11:34&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-14 03:24:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-17 05:06:29&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-20 07:19:55&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-23 14:10:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-26 19:03:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-30 13:05:23&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-02 13:25:12&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-05 14:08:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-09 10:16:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-12 11:04:53&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-15 17:09:49&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-18 18:39:09&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-21 19:15:09&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-26 04:14:49&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-29 05:32:17&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-01 07:55:30&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-04 10:44:57&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-07 12:28:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-11 00:52:17&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-14 12:51:24&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-17 14:17:39&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-20 17:49:34&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-24 04:42:19&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-27 06:25:21&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-02 08:44:49&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-05 10:27:17&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-08 11:13:11&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-11 12:04:06&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-14 12:32:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-18 01:16:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-21 21:29:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-25 06:37:35&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-28 07:59:07&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-31 10:36:07&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-04 11:16:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-07 18:11:02&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-11 05:09:37&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-14 06:26:10&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-18 15:58:17&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-22 11:10:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-27 06:59:55&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-30 12:38:54&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-03 15:24:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-06 17:05:30&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-10 12:07:21&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-14 23:33:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-19 11:27:54&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-23 02:59:38&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-29 05:05:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-01 06:55:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-04 20:59:59&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-08 05:37:55&quot;,&quot;http_code&quot;:206}],&quot;broken&quot;:false,&quot;last_checked&quot;:{&quot;date&quot;:&quot;2026-06-08 05:37:55&quot;,&quot;http_code&quot;:206},&quot;process&quot;:&quot;done&quot;}]'><\/div>\n<h2><span style=\"font-family: Georgia, Georgia, serif\">1. Objective<\/span><\/h2>\n<p><span style=\"font-weight: 400\">In <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-for-beginners\/\"><strong>Apache Spark<\/strong><\/a>, key-value pairs are what we call as paired RDD. This Spark Paired RDD tutorial aims the information on what are paired <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-rdd-tutorial\/\"><strong>RDDs<\/strong><\/a> in Spark. We will also learn following methods of creating spark paired RDD and operations on paired RDDs in spark. Such as <a href=\"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/\"><strong>transformations and actions<\/strong><\/a> in Spark RDD. Here transformation operations are groupByKey, reduceByKey, join, left outer join\/right OuterJoin. Whereas actions like countByKey. However initially, we will learn a brief introduction to Spark RDDs.<\/span><\/p>\n<p>So, let&#8217;s start Spark Paired RDD Tutorial.<\/p>\n<div id=\"attachment_42316\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Spark-Paired-RDD-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-42316\" class=\"size-full wp-image-42316\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Spark-Paired-RDD-01.jpg\" alt=\"Introduction to Apache Spark Paired RDD\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Spark-Paired-RDD-01.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Spark-Paired-RDD-01-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Spark-Paired-RDD-01-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Spark-Paired-RDD-01-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Spark-Paired-RDD-01-1024x536.jpg 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Spark-Paired-RDD-01-520x272.jpg 520w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-42316\" class=\"wp-caption-text\">Introduction to Apache Spark Paired RDD<\/p><\/div>\n<h2>2.\u00a0What is Spark RDD?<\/h2>\n<p><span style=\"font-weight: 400\"><em>Apache Spark\u2019s Core abstraction is Resilient Distributed Datasets, an acronym for Resilient Distributed Datasets is RDD<\/em>. Also, a fundamental data structure of Spark. Moreover, Spark RDDs is immutable in nature. As well as the distributed collection of objects. Basically, RDD in spark is designed as each <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-dataset-tutorial\/\"><strong>dataset<\/strong><\/a> in RDD is divided into logical partitions. Further, we can say here each partition may be computed on different nodes of the cluster. Moreover, Spark RDDs contain user-defined classes.<\/span><\/p>\n<p><strong><a href=\"https:\/\/data-flair.training\/blogs\/spark-quiz-questions-part-2\/\">You must test your Spark Learning<\/a><\/strong><\/p>\n<div id=\"attachment_5978\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Paired-RDD-01-1.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5978\" class=\"wp-image-5978 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Paired-RDD-01-1.jpg\" alt=\"Spark Paired RDD\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Paired-RDD-01-1.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Paired-RDD-01-1-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Paired-RDD-01-1-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Paired-RDD-01-1-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Paired-RDD-01-1-1024x536.jpg 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-5978\" class=\"wp-caption-text\">Spark Paired RDD<\/p><\/div>\n<p><span style=\"font-weight: 400\">In addition, Spark RDD is a read-only, partitioned collection of records. Also, They are the<a href=\"https:\/\/data-flair.training\/blogs\/fault-tolerance-in-apache-spark\/\"> <strong>fault-tolerant<\/strong> <\/a>collection of elements which we can operate in parallel. We can also <a href=\"https:\/\/data-flair.training\/blogs\/create-rdds-in-apache-spark\/\"><strong>create RDDs<\/strong><\/a>, basically in 3 ways. Either by data in stable storage, by other RDDs, or by parallelizing existing collection in driver program. We can achieve faster and efficient MapReduce operations through RDDs.<\/span><\/p>\n<h2>3. Introduction on Spark Paired RDD<\/h2>\n<p><span style=\"font-weight: 400\">Spark Paired RDDs are nothing but RDDs containing a key-value pair. Basically, key-value pair (KVP) consists of a two linked data item in it. Here, the key is the identifier, whereas value is the data corresponding to the key value.<\/span><br \/>\n<span style=\"font-weight: 400\">Moreover, Spark operations work on RDDs containing any type of objects. However key-value pair RDDs attains few special operations in it. Such as, distributed \u201cshuffle\u201d operations, grouping or aggregating the elements by a key.<\/span><br \/>\n<span style=\"font-weight: 400\">In addition, on Spark Paired RDDs containing Tuple2 objects in Scala, these operations are automatically available. Basically, operations for the key-value pair are available in the Pair RDD functions class. However, that wraps around a Spark RDD of tuples.<\/span><\/p>\n<p><strong><a href=\"https:\/\/data-flair.training\/blogs\/spark-sql-features\/\">Have a look at Spark SQL Features<\/a><\/strong><br \/>\n<span style=\"font-weight: 400\">For example,<\/span><br \/>\n<span style=\"font-weight: 400\">Basically here we are using the reduceByKey operation on key-value pairs. In this code we will count how many times each line of text occurs in a file:<\/span><br \/>\n<b>val lines22 = sc.textFile(&#8220;data1.txt&#8221;)<\/b><br \/>\n<b>val pairs22= lines22.map(s =&gt; (s, 1))<\/b><br \/>\n<b>val counts1 = pairs22.reduceByKey((a, b) =&gt; a + b)<\/b><br \/>\n<span style=\"font-weight: 400\">Although, one more method we can use is counts.sortByKey().<\/span><\/p>\n<h2>4. Importance of Paired RDD in\u00a0Apache Spark<\/h2>\n<p><span style=\"font-weight: 400\">We can say pair RDDs plays the role of very useful building block, in many programs. Basically, some operations that allow us to act on each key in parallel, that exposes those operations. Moreover, through this, we can regroup the data across the network. Like, in spark paired RDDs reduceByKey() method aggregate data separately for each key. Whereas join() method, merges two RDDs together by grouping elements with the same key. However, we can easily extract fields from an RDD. Such as customer ID, an event time, representing, for instance, or other identifiers. Afterward, it uses those fields as keys in spark pair RDD operations.<\/span><\/p>\n<p><strong><a href=\"https:\/\/data-flair.training\/blogs\/structured-streaming-in-sparkr\/\">Do you know about Structured Streaming in SparkR<\/a><\/strong><\/p>\n<h2>5. How to Create Spark Paired RDD<\/h2>\n<p>There are several ways to create Paired RDD in Spark, like by running a map() function that returns key-value pairs. However, language differs the procedure to build the key-value RDD. Such as<\/p>\n<h3>a. In Python language<\/h3>\n<p><span style=\"font-weight: 400\">It is a requirement to return an RDD composed of Tuples for the functions of keyed data to work. Moreover, in spark for creating a pair RDD, we use the first word as the key in python programming language.<\/span><br \/>\n<b>pairs = lines.map(lambda x: (x.split(\u201d \u201c)[0], x))<\/b><\/p>\n<h3>b. In Scala language<\/h3>\n<p><span style=\"font-weight: 400\">As similar to the previous example here also we need to return tuples. Furthermore, this will make available the functions of keyed data. Also, to offer the extra key or value functions, an implicit conversion on Spark RDD of tuples exists.<\/span><\/p>\n<p><strong><a href=\"https:\/\/data-flair.training\/blogs\/data-type-mapping-between-r-and-spark\/\">Let&#8217;s revise Data type Mapping between R and Spark<\/a><\/strong><br \/>\n<span style=\"font-weight: 400\">Afterward, again by using the first word as the keyword creating apache spark pair RDD.<\/span><br \/>\n<b>val pairs = lines.map(x =&gt; (x.split(\u201d \u201c)(0), x))<\/b><\/p>\n<h3>c. In Java language<\/h3>\n<p><span style=\"font-weight: 400\">Basically, Java doesn\u2019t have a built-in function of tuple function. Therefore, we can use the Scala. It only sparks&#8217; Java API has users create tuples.Tuple2 class. However by, by writing new Tuple2(elem1, elem2) in Java, we can create a new tuple. Moreover, we can access its relevant elements with the _1() and _2() methods.<\/span><br \/>\n<span style=\"font-weight: 400\">Moreover, when we create paired RDDs in Spark, it is must to call special versions of spark\u2019s functions in java. As an example, we can use mapToPair () function in place of the basic map() function.<\/span><br \/>\n<span style=\"font-weight: 400\">Again, here using the first word as the keyword to create a Spark paired RDD, <\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">PairFunction&lt;String, String, String&gt; keyData =\r\nnew PairFunction&lt;String, String, String&gt;() {\r\npublic Tuple2&lt;String, String&gt; call(String x) {\r\nreturn new Tuple2(x.split(\u201d \u201c)[0], x);\r\n}\r\n};\r\nJavaPairRDD&lt;String, String&gt; pairs = lines.mapToPair(keyData)<\/pre>\n<p><strong><a href=\"https:\/\/data-flair.training\/blogs\/spark-interview-questions\/\">Prepare yourself for Spark Interview<\/a><\/strong><\/p>\n<h2>6. Spark Paired RDD Operations<\/h2>\n<h3>a. Transformation Operations<\/h3>\n<p><span style=\"font-weight: 400\">Paired RDD allows the same transformation those are available to standard RDDs. Moreover, here also same rules apply from \u201cpassing functions to spark\u201d. Also in Spark, there are tuples available in paired RDDs. Basically, we need to pass functions that operate on tuples, despite on individual elements. Let\u2019s discuss some of the transformation methods below, like<\/span><\/p>\n<ul>\n<li>\n<h3><b>groupByKey<\/b><\/h3>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">The groupbykey operation generally groups all the values with the same key.<\/span><br \/>\n<b>rdd.groupByKey()<\/b><\/p>\n<ul>\n<li>\n<h3><b>reduceByKey(fun)<\/b><\/h3>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Here, the reduceByKey operation generally combines values with the same key.<\/span><br \/>\n<b>add.reduceByKey( (x, y) =&gt; x + y)<\/b><\/p>\n<p><strong><a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-graphx-features\/\">Let&#8217;s discuss Spark GraphX Features<\/a><\/strong><\/p>\n<ul>\n<li>\n<h3><b>combineByKey(createCombiner, mergeValue, mergeCombiners, partitioner)<\/b><\/h3>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">CombineByKey uses a different result type, then combine those values with the same key.<\/span><\/p>\n<ul>\n<li>\n<h3><b>mapValues(func)<\/b><\/h3>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Even without changing the key, mapValues operation applies a function to each value of a paired RDD of spark.<\/span><br \/>\n<b>rdd.mapValues(x =&gt; x+1)<\/b><\/p>\n<ul>\n<li>\n<h3><b>keys()<\/b><\/h3>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Keys() operation generally returns a spark RDD of just the keys.<\/span><br \/>\n<b>rdd.keys()<\/b><\/p>\n<ul>\n<li>\n<h3><b>values()<\/b><\/h3>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">values() operation generally returns an RDD of just the values.<\/span><br \/>\n<b>rdd.values()<\/b><\/p>\n<p><strong><a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-mllib\/\">Let&#8217;s revise Spark MLlib Algorithms<\/a><\/strong><\/p>\n<ul>\n<li>\n<h3><b>sortByKey()<\/b><\/h3>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Similarly, the sortByKey operation generally returns an RDD sorted by the key.<\/span><br \/>\n<b>rdd.sortByKey()<\/b><\/p>\n<h3>b. Action Operations<\/h3>\n<p><span style=\"font-weight: 400\">As similar as RDD transformations, there are same RDD actions available on spark pair RDD. However, paired RDDs also attains some additional actions of spark. Basically, those leverages the advantage of data which is of keyvalue nature. Let\u2019s discuss some of the action methods below, like<\/span><\/p>\n<ul>\n<li>\n<h3><b>countByKey()<\/b><\/h3>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Through countByKey operation, we can count the number of elements for each key.<\/span><br \/>\n<b>rdd.countByKey()<\/b><\/p>\n<ul>\n<li>\n<h3><b>collectAsMap()<\/b><\/h3>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Here, collectAsMap() operation helps to collect the result as a map to provide easy lookup.<\/span><br \/>\n<b>rdd.collectAsMap()<\/b><\/p>\n<p><strong><a href=\"https:\/\/data-flair.training\/blogs\/spark-stage\/\">Have a look at Spark Stage<\/a><\/strong><\/p>\n<ul>\n<li>\n<h3><b>lookup(key)<\/b><\/h3>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Moreover, it returns all values associated with the provided key<\/span>.<br \/>\n<b>rdd.lookup()<\/b><\/p>\n<h2>7. Conclusion<\/h2>\n<p><span style=\"font-weight: 400\">As a result, we have learned to work with Spark key-value data. Moreover, we have also learned how to create Spark Paired RDD and how to use the specialized Spark functions and operations. However, we hope this article answered all your questions regarding same. Still, if you feel to ask any query, feel free to ask in the comment section.<\/span><\/p>\n<p><strong>See also &#8211;\u00a0<\/strong><\/p>\n<p><strong><a href=\"https:\/\/data-flair.training\/blogs\/spark-quiz-questions-part-2\/\">Spark Quiz<\/a><\/strong><br \/>\n<a href=\"https:\/\/spark.apache.org\/\"><strong>For reference<\/strong><\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. Objective In Apache Spark, key-value pairs are what we call as paired RDD. This Spark Paired RDD tutorial aims the information on what are paired RDDs in Spark. We will also learn following&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":42316,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[937,4407,7081,9277,9389,9390,13093,13094,14898,15841,15960,15961],"class_list":["post-5918","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-spark","tag-apache-spark-paired-rdd","tag-examples-of-spark-paired-rdd","tag-introduction-to-paired-rdds","tag-operations-in-spark-paired-rdd","tag-paired-rdd","tag-paired-rdds-in-apache-spark","tag-spark-paired-rdd","tag-spark-paired-rdd-operations","tag-transformation-operations","tag-what-is-paired-rdd","tag-what-is-spark-paired-rdd","tag-what-is-spark-rdd"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Introduction to Apache Spark Paired RDD - DataFlair<\/title>\n<meta name=\"description\" content=\"Apache Spark Paired RDD- what is spark RDD,what is Spark Paired RDD,Importance of Paired RDD in Spark,create spark paired RDD,operations in Spark Paired RDD\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Introduction to Apache Spark Paired RDD - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Apache Spark Paired RDD- what is spark RDD,what is Spark Paired RDD,Importance of Paired RDD in Spark,create spark paired RDD,operations in Spark Paired RDD\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-17T09:06:41+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-11-16T08:21:04+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Spark-Paired-RDD-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Introduction to Apache Spark Paired RDD - DataFlair","description":"Apache Spark Paired RDD- what is spark RDD,what is Spark Paired RDD,Importance of Paired RDD in Spark,create spark paired RDD,operations in Spark Paired RDD","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/","og_locale":"en_US","og_type":"article","og_title":"Introduction to Apache Spark Paired RDD - DataFlair","og_description":"Apache Spark Paired RDD- what is spark RDD,what is Spark Paired RDD,Importance of Paired RDD in Spark,create spark paired RDD,operations in Spark Paired RDD","og_url":"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2018-01-17T09:06:41+00:00","article_modified_time":"2018-11-16T08:21:04+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Spark-Paired-RDD-01.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"Introduction to Apache Spark Paired RDD","datePublished":"2018-01-17T09:06:41+00:00","dateModified":"2018-11-16T08:21:04+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/"},"wordCount":1225,"commentCount":6,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Spark-Paired-RDD-01.jpg","keywords":["Apache Spark Paired RDD","Examples of spark paired RDD","Introduction to Paired RDDs","Operations in Spark Paired RDD","Paired RDD","Paired RDDs in Apache Spark","Spark Paired RDD","Spark Paired RDD Operations","Transformation Operations","What is Paired RDD","What is Spark Paired RDD","what is spark RDD"],"articleSection":["Apache Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/","url":"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/","name":"Introduction to Apache Spark Paired RDD - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Spark-Paired-RDD-01.jpg","datePublished":"2018-01-17T09:06:41+00:00","dateModified":"2018-11-16T08:21:04+00:00","description":"Apache Spark Paired RDD- what is spark RDD,what is Spark Paired RDD,Importance of Paired RDD in Spark,create spark paired RDD,operations in Spark Paired RDD","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Spark-Paired-RDD-01.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/Spark-Paired-RDD-01.jpg","width":1200,"height":628,"caption":"Introduction to Apache Spark Paired RDD"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/spark-paired-rdd\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Apache Spark Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/spark\/"},{"@type":"ListItem","position":3,"name":"Introduction to Apache Spark Paired RDD"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/5918","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=5918"}],"version-history":[{"count":6,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/5918\/revisions"}],"predecessor-version":[{"id":42317,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/5918\/revisions\/42317"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/42316"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=5918"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=5918"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=5918"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}