{"id":682,"date":"2016-07-26T05:19:08","date_gmt":"2016-07-26T05:19:08","guid":{"rendered":"http:\/\/data-flair.training\/blogs\/?p=682"},"modified":"2018-11-20T14:16:40","modified_gmt":"2018-11-20T08:46:40","slug":"spark-rdd-operations-transformations-actions","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/","title":{"rendered":"Spark RDD Operations-Transformation &amp; Action with Example"},"content":{"rendered":"<h2>1. Spark RDD Operations<\/h2>\n<p>Two types of<strong> Apache Spark<\/strong>\u00a0RDD operations are- Transformations and Actions. A <strong>Transformation<\/strong> is a function that produces new<strong> RDD<\/strong> from the existing RDDs but when we want to work with the actual dataset, at that point <strong>Action<\/strong> is performed. When the action is triggered after the result, new RDD is not formed like transformation. In this <strong><a href=\"https:\/\/data-flair.training\/blogs\/spark-tutorial\/\">Apache Spark<\/a><\/strong> RDD operations tutorial we will get the detailed view of what is Spark RDD, what is the transformation in Spark RDD, various RDD transformation operations in Spark with examples, what is action in Spark RDD and various RDD action operations in Spark with examples.<\/p>\n<div id=\"attachment_42915\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Spark-RDD-Operations-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-42915\" class=\"size-full wp-image-42915\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Spark-RDD-Operations-01.jpg\" alt=\"Spark RDD Operations-Transformation &amp; Action with Example\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Spark-RDD-Operations-01.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Spark-RDD-Operations-01-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Spark-RDD-Operations-01-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Spark-RDD-Operations-01-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Spark-RDD-Operations-01-1024x536.jpg 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Spark-RDD-Operations-01-520x272.jpg 520w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-42915\" class=\"wp-caption-text\">Spark RDD Operations-Transformation &amp; Action with Example<\/p><\/div>\n<h2>2. Apache Spark RDD Operations<\/h2>\n<p>Before we start with Spark RDD Operations, let us deep dive into <strong><a href=\"http:\/\/data-flair.training\/blogs\/rdd-in-apache-spark\/\">RDD in Spark.<\/a><\/strong><br \/>\nApache Spark RDD supports two types of Operations-<\/p>\n<ul>\n<li>Transformations<\/li>\n<li>Actions<\/li>\n<\/ul>\n<p>Now let us understand first what is Spark RDD Transformation and Action-<\/p>\n<h2>3. RDD Transformation<\/h2>\n<p><strong>Spark Transformation<\/strong> is a function that produces new RDD from the existing RDDs. It takes RDD as input and produces one or more RDD as output. Each time it creates new RDD when we apply any transformation. Thus, the so input RDDs, cannot be changed since RDD are immutable in nature.<\/p>\n<p>Applying transformation built an <strong>RDD lineage<\/strong>, with the entire parent RDDs of the final RDD(s). RDD lineage,\u00a0also known as <strong>RDD operator graph\u00a0<\/strong>or\u00a0<strong>RDD dependency graph.<\/strong>\u00a0It is a logical execution plan i.e., it is Directed Acyclic Graph (<strong><a href=\"http:\/\/data-flair.training\/blogs\/directed-acyclic-graph-dag-in-apache-spark\/\">DAG<\/a><\/strong>) of the entire parent RDDs of RDD.<\/p>\n<p><strong><a href=\"http:\/\/data-flair.training\/blogs\/lazy-evaluation-in-apache-spark-guide\/\">Transformations are lazy<\/a><\/strong> in nature i.e., they get execute when we call an action. They are not executed immediately. Two most basic type of transformations is a map(), filter().<br \/>\nAfter the transformation, the resultant RDD is always different from its parent RDD. It can be smaller (e.g.\u00a0filter,\u00a0count,\u00a0distinct,\u00a0sample), bigger (e.g.\u00a0flatMap(),\u00a0union(),\u00a0Cartesian()) or the same size (e.g.\u00a0map).<\/p>\n<p>There are two types of transformations:<\/p>\n<ul>\n<li><strong>Narrow transformation &#8211;\u00a0<\/strong>In <em>Narrow transformation<\/em>, all the elements that are required to compute the records in single partition live in the single partition of parent RDD. A limited subset of partition is used to calculate the result.\u00a0<em>Narrow transformations<\/em>\u00a0are the result of <em>map(), filter().<\/em><\/li>\n<\/ul>\n<div id=\"attachment_3864\" style=\"width: 668px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/spark-narrow-transformation-2.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3864\" class=\"wp-image-3864 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/spark-narrow-transformation-2.jpg\" alt=\"Apache Spark Narrow Transformation Operation\" width=\"658\" height=\"345\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/spark-narrow-transformation-2.jpg 658w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/spark-narrow-transformation-2-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/spark-narrow-transformation-2-300x157.jpg 300w\" sizes=\"auto, (max-width: 658px) 100vw, 658px\" \/><\/a><p id=\"caption-attachment-3864\" class=\"wp-caption-text\">Apache Spark Narrow Transformation Operation<\/p><\/div>\n<ul>\n<li><strong>Wide transformation &#8211;\u00a0<\/strong>In wide transformation, all the elements that are required to compute the records in the single partition may live in many partitions of parent RDD. The partition may live in many partitions of parent RDD.\u00a0<em>Wide transformations<\/em> are the result of <em>groupbyKey()<\/em> and <em>reducebyKey()<\/em>.<\/li>\n<\/ul>\n<div id=\"attachment_3865\" style=\"width: 669px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/spark-wide-transformation-1.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3865\" class=\"wp-image-3865 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/spark-wide-transformation-1.jpg\" alt=\"Spark Wide Transformation Operations\" width=\"659\" height=\"345\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/spark-wide-transformation-1.jpg 659w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/spark-wide-transformation-1-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/spark-wide-transformation-1-300x157.jpg 300w\" sizes=\"auto, (max-width: 659px) 100vw, 659px\" \/><\/a><p id=\"caption-attachment-3865\" class=\"wp-caption-text\">Spark Wide Transformation Operations<\/p><\/div>\n<p>There are various functions in RDD transformation. Let us see RDD transformation with examples.<\/p>\n<h3>3.1. map(func)<\/h3>\n<p>The map function iterates over every line in RDD and split into new RDD. Using <strong>map()<\/strong> transformation we take in any function, and that function is applied to every element of RDD.<\/p>\n<p>In the map, we have the flexibility that the input and the return type of RDD may differ from each other. For example, we can have input RDD type as String, after applying the<\/p>\n<p>map() function the return RDD can be Boolean.<\/p>\n<p>For example, in RDD {1, 2, 3, 4, 5} if we apply \u201crdd.map(x=&gt;x+2)\u201d we will get the result as (3, 4, 5, 6, 7).<\/p>\n<p>Also Read: <strong><a href=\"http:\/\/data-flair.training\/blogs\/how-to-create-rdds-in-apache-spark\/\">How to create RDD<\/a><\/strong><\/p>\n<p><strong>Map() example:<\/strong><\/p>\n<p>[php]import org.apache.spark.SparkContext<br \/>\nimport org.apache.spark.SparkConf<br \/>\nimport org.apache.spark.sql.SparkSession<br \/>\nobject \u00a0mapTest{<br \/>\ndef main(args: Array[String]) = {<br \/>\nval spark = SparkSession.builder.appName(&#8220;mapExample&#8221;).master(&#8220;local&#8221;).getOrCreate()<br \/>\nval data = spark.read.textFile(&#8220;spark_test.txt&#8221;).rdd<br \/>\nval mapFile = data.map(line =&gt; (line,line.length))<br \/>\nmapFile.foreach(println)<br \/>\n}<br \/>\n}[\/php]<\/p>\n<p><strong>spark_test.txt&#8221; <\/strong><\/p>\n<pre>hello...user! this file is created to check the operations of spark.<\/pre>\n<pre>?, and how can we apply functions on that RDD partitions?. All this will be done through spark programming which is done with the help of scala language support\u2026<\/pre>\n<ul>\n<li><em><strong>Note &#8211;\u00a0<\/strong><\/em>In above code, map() function map each line of the file with its length.<\/li>\n<\/ul>\n<h3>3.2. flatMap()<\/h3>\n<p>With the help of <strong>flatMap()<\/strong> function, to each input element, we have many elements in an output RDD. The most simple use of flatMap() is to split each input string into words.<br \/>\nMap and flatMap are similar in the way that they take a line from input RDD and apply a function on that line. The key <strong><a href=\"http:\/\/data-flair.training\/blogs\/map-vs-flatmap-operation-in-apache-spark\/\">difference between map() and flatMap()<\/a><\/strong> is map() returns only one element, while flatMap() can return a list of elements.<\/p>\n<p><strong>flatMap() example:<\/strong><\/p>\n<p>[php]val data = spark.read.textFile(&#8220;spark_test.txt&#8221;).rdd<br \/>\nval flatmapFile = data.flatMap(lines =&gt; lines.split(&#8221; &#8220;))<br \/>\nflatmapFile.foreach(println)[\/php]<\/p>\n<ul>\n<li><strong><em>Note<\/em> &#8211; <\/strong>In above code, flatMap() function splits each line when space occurs.<\/li>\n<\/ul>\n<h3>3.3. filter(func)<\/h3>\n<p>Spark RDD <strong>filter()<\/strong> function returns a new RDD, containing only the elements that meet a predicate. It is a <em>narrow operation<\/em>\u00a0because it does not shuffle data from one partition to many partitions.<\/p>\n<p>For example, Suppose RDD contains first five natural numbers (1, 2, 3, 4, and 5) and the predicate is check for an even number. The resulting RDD after the filter will contain only the even numbers i.e., 2 and 4.<\/p>\n<p><strong>Filter() example:<\/strong><\/p>\n<p>[php]val data = spark.read.textFile(&#8220;spark_test.txt&#8221;).rdd<br \/>\nval mapFile = data.flatMap(lines =&gt; lines.split(&#8221; &#8220;)).filter(value =&gt; value==&#8221;spark&#8221;)<br \/>\nprintln(mapFile.count())[\/php]<\/p>\n<ul>\n<li><em><strong>Note\u00a0<\/strong>&#8211;<\/em> In above code, flatMap function map line into words and then count the word \u201cSpark\u201d using count() Action after filtering lines containing \u201cSpark\u201d from mapFile.<\/li>\n<\/ul>\n<p><strong>Read: <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-rdd-vs-dataframe-vs-dataset\/\">Apache Spark RDD vs DataFrame vs DataSet<\/a><\/strong><\/p>\n<h3>3.4. mapPartitions(func)<\/h3>\n<p>The<strong> MapPartition<\/strong> converts each\u00a0<em>partition<\/em>\u00a0of the source RDD into many elements of the result (possibly none). In mapPartition(), the map() function is applied on each partitions simultaneously. MapPartition is like a map, but the difference is it runs separately on each partition(block) of the RDD.<\/p>\n<h3>3.5. mapPartitionWithIndex()<\/h3>\n<p>It is like mapPartition; Besides mapPartition it provides\u00a0<em>func<\/em>\u00a0with an integer value representing the index of the partition, and the map() is applied on partition index wise one after the other.<\/p>\n<p class=\"entry-title \"><strong>Learn: <a href=\"https:\/\/data-flair.training\/blogs\/scala-spark-shell-commands\/\">Spark Shell Commands to Interact with Spark-Scala<\/a><\/strong><\/p>\n<h3>3.6. union(dataset)<\/h3>\n<p>With the <strong>union()<\/strong> function, we get the elements of both the RDD in new RDD. The key rule of this function is that the two RDDs should be of the same type.<br \/>\nFor example, the elements of <strong>RDD1<\/strong> are (Spark, Spark,<strong><a href=\"http:\/\/data-flair.training\/blogs\/hadoop-introduction-tutorial-quick-guide\/\"> Hadoop<\/a><\/strong>, <strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-flink-tutorial-comprehensive-guide\/\">Flink<\/a><\/strong>) and that of<strong> RDD2<\/strong> are (<a href=\"http:\/\/data-flair.training\/blogs\/why-learn-big-data-use-cases\/\"><strong>Big data<\/strong><\/a>, Spark, Flink) so the resultant <em><strong>rdd1.union(rdd2)<\/strong><\/em> will have elements (Spark, Spark, Spark, Hadoop, Flink, Flink, Big data).<\/p>\n<p><strong>Union() example:<\/strong><\/p>\n<p>[php]val rdd1 = spark.sparkContext.parallelize(Seq((1,&#8221;jan&#8221;,2016),(3,&#8221;nov&#8221;,2014),(16,&#8221;feb&#8221;,2014)))<br \/>\nval rdd2 = spark.sparkContext.parallelize(Seq((5,&#8221;dec&#8221;,2014),(17,&#8221;sep&#8221;,2015)))<br \/>\nval rdd3 = spark.sparkContext.parallelize(Seq((6,&#8221;dec&#8221;,2011),(16,&#8221;may&#8221;,2015)))<br \/>\nval rddUnion = rdd1.union(rdd2).union(rdd3)<br \/>\nrddUnion.foreach(Println)[\/php]<\/p>\n<ul>\n<li><em><strong>Note &#8211;<\/strong><\/em>\u00a0In above code union() operation will return a new dataset that contains the union of the elements in the source dataset (rdd1) and the argument (rdd2 &amp; rdd3).<\/li>\n<\/ul>\n<h3>3.7. intersection(other-dataset)<\/h3>\n<p>With the <strong>intersection()<\/strong> function, we get only the common element of both the RDD in new RDD. The key rule of this function is that the two RDDs should be of the same type.<br \/>\nConsider an example, the elements of <strong>RDD1<\/strong> are (Spark, Spark, Hadoop, Flink) and that of <strong>RDD2<\/strong> are (Big data, Spark, Flink) so the resultant <strong><em>rdd1.intersection(rdd2)<\/em><\/strong> will have elements (spark).<\/p>\n<p><strong>Intersection() example:<\/strong><\/p>\n<p>[php]val rdd1 = spark.sparkContext.parallelize(Seq((1,&#8221;jan&#8221;,2016),(3,&#8221;nov&#8221;,2014, (16,&#8221;feb&#8221;,2014)))<br \/>\nval rdd2 = spark.sparkContext.parallelize(Seq((5,&#8221;dec&#8221;,2014),(1,&#8221;jan&#8221;,2016)))<br \/>\nval comman = rdd1.intersection(rdd2)<br \/>\ncomman.foreach(Println)[\/php]<\/p>\n<ul>\n<li><strong><em>Note\u00a0<\/em>&#8211;<\/strong>\u00a0The intersection() operation return a new RDD. It contains the intersection of elements in the rdd1 &amp; rdd2.<\/li>\n<\/ul>\n<p><strong>Learn to <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-installation-on-ubuntu\/\">Install Spark on Ubuntu<\/a><\/strong><\/p>\n<h3>3.8. distinct()<\/h3>\n<p>It returns a new dataset that contains the <strong>distinct<\/strong> elements of the source dataset. It is helpful to remove duplicate data.<br \/>\nFor example, if RDD has elements (Spark, Spark, Hadoop, Flink),<em><strong>\u00a0<\/strong><\/em>then <em><strong>rdd.distinct()<\/strong><\/em> will give elements (Spark, Hadoop, Flink).<\/p>\n<p><strong>Distinct() example:<\/strong><\/p>\n<p>[php]val rdd1 = park.sparkContext.parallelize(Seq((1,&#8221;jan&#8221;,2016),(3,&#8221;nov&#8221;,2014),(16,&#8221;feb&#8221;,2014),(3,&#8221;nov&#8221;,2014)))<br \/>\nval result = rdd1.distinct()<br \/>\nprintln(result.collect().mkString(&#8220;, &#8220;))[\/php]<\/p>\n<ul>\n<li><em><strong>Note &#8211;<\/strong><\/em> In the above example, the distinct function will remove the duplicate record i.e. (3,'&#8221;nov&#8221;,2014).<\/li>\n<\/ul>\n<h3>3.9. groupByKey()<\/h3>\n<p>When we use <strong>groupByKey()<\/strong> on a dataset of (K, V) pairs, the data is shuffled according to the key value K in another RDD. In this transformation, lots of unnecessary data get to transfer over the network.<\/p>\n<p>Spark provides the provision to save data to disk when there is more data shuffled onto a single executor machine than can fit in memory. Follow this link to <strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-rdd-persistence-caching\/\">learn about RDD <\/a><\/strong><strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-rdd-persistence-caching\/\">Caching and Persistence mechanism<\/a><\/strong> in detail.<\/p>\n<p><strong>groupByKey() example:<\/strong><\/p>\n<p>[php]val data = spark.sparkContext.parallelize(Array((&#8216;k&#8217;,5),(&#8216;s&#8217;,3),(&#8216;s&#8217;,4),(&#8216;p&#8217;,7),(&#8216;p&#8217;,5),(&#8216;t&#8217;,8),(&#8216;k&#8217;,6)),3)<br \/>\nval group = data.groupByKey().collect()<br \/>\ngroup.foreach(println)[\/php]<\/p>\n<ul>\n<li><em><strong>Note &#8211;<\/strong><\/em>\u00a0The groupByKey() will group the integers on the basis of same key(alphabet). After that\u00a0<em>collect()<\/em> action will return all the elements of the dataset as an Array.<\/li>\n<\/ul>\n<h3>3.10. reduceByKey(func, [numTasks])<\/h3>\n<p>When we use\u00a0<strong>reduceByKey<\/strong>\u00a0on a dataset (K, V), the pairs on the same machine with the same key are combined, before the data is shuffled.<\/p>\n<p><strong>reduceByKey() example:<\/strong><\/p>\n<p>[php]val words = Array(&#8220;one&#8221;,&#8221;two&#8221;,&#8221;two&#8221;,&#8221;four&#8221;,&#8221;five&#8221;,&#8221;six&#8221;,&#8221;six&#8221;,&#8221;eight&#8221;,&#8221;nine&#8221;,&#8221;ten&#8221;)<br \/>\nval data = spark.sparkContext.parallelize(words).map(w =&gt; (w,1)).reduceByKey(_+_)<br \/>\ndata.foreach(println)[\/php]<\/p>\n<ul>\n<li><em><strong>Note &#8211;<\/strong><\/em> The above code will parallelize the Array of String. It will then map each word with count 1, then reduceByKey will merge the count of values having the similar key.<\/li>\n<\/ul>\n<p><strong>Read: <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-rdd-features\/\">Various Features of RDD<\/a><\/strong><\/p>\n<h3>3.11. sortByKey()<\/h3>\n<p>When we apply the\u00a0<strong>sortByKey() function<\/strong>\u00a0on a dataset of (K, V) pairs, the data is sorted according to the key K in another RDD.<\/p>\n<p><strong>sortByKey() example:<\/strong><\/p>\n<p>[php] val data = spark.sparkContext.parallelize(Seq((&#8220;maths&#8221;,52),\u00a0(&#8220;english&#8221;,75),\u00a0(&#8220;science&#8221;,82),\u00a0(&#8220;computer&#8221;,65), (&#8220;maths&#8221;,85)))<br \/>\nval sorted = data.sortByKey()<br \/>\nsorted.foreach(println)[\/php]<\/p>\n<ul>\n<li><strong><em>Note<\/em> &#8211;<\/strong> In above code,\u00a0sortByKey() transformation sort the data RDD into Ascending order of the Key(String).<\/li>\n<\/ul>\n<p><strong>Read: <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-rdd-limitations\/\">Limitations of RDD<\/a><\/strong><\/p>\n<h3>3.12. join()<\/h3>\n<p>The<strong> Join <\/strong>is database term. It combines the fields from two table using common values. join() operation in Spark is defined on pair-wise RDD. Pair-wise RDDs are RDD in which each element is in the form of tuples. Where the first element is key and the second element is the value.<\/p>\n<p>The boon of using keyed data is that we can combine the data together. The join() operation combines two data sets on the basis of the key.<\/p>\n<p><strong>Join() example:<\/strong><\/p>\n<p>[php]val data = spark.sparkContext.parallelize(Array((&#8216;A&#8217;,1),(&#8216;b&#8217;,2),(&#8216;c&#8217;,3)))<br \/>\nval data2 =spark.sparkContext.parallelize(Array((&#8216;A&#8217;,4),(&#8216;A&#8217;,6),(&#8216;b&#8217;,7),(&#8216;c&#8217;,3),(&#8216;c&#8217;,8)))<br \/>\nval result = data.join(data2)<br \/>\nprintln(result.collect().mkString(&#8220;,&#8221;))[\/php]<\/p>\n<ul>\n<li><em><strong>Note<\/strong> &#8211;<\/em>\u00a0 The join() transformation will join two different RDDs on the basis of Key.<\/li>\n<\/ul>\n<p class=\"entry-title \"><strong>Read: <a href=\"https:\/\/data-flair.training\/blogs\/rdd-lineage\/\">RDD lineage in Spark: ToDebugString Method<\/a><\/strong><\/p>\n<h3>3.13. coalesce()<\/h3>\n<p>To avoid full shuffling of data we use coalesce() function. In <strong>coalesce()<\/strong> we use existing partition so that less data is shuffled. Using this we can cut the number of the partition. Suppose, we have four nodes and we want only two nodes. Then the data of extra nodes will be kept onto nodes which we kept.<\/p>\n<p><strong>Coalesce() example:<\/strong><\/p>\n<p>[php]val rdd1 = spark.sparkContext.parallelize(Array(&#8220;jan&#8221;,&#8221;feb&#8221;,&#8221;mar&#8221;,&#8221;april&#8221;,&#8221;may&#8221;,&#8221;jun&#8221;),3)<br \/>\nval result = rdd1.coalesce(2)<br \/>\nresult.foreach(println)[\/php]<\/p>\n<ul>\n<li><strong><em>Note<\/em> &#8211;<\/strong>\u00a0The coalesce will decrease the number of partitions of the source RDD to numPartitions define in coalesce argument.<\/li>\n<\/ul>\n<h2>4. RDD Action<\/h2>\n<p><strong>Transformations<\/strong> <a href=\"http:\/\/data-flair.training\/blogs\/how-to-create-rdds-in-apache-spark\/\"><strong>create RDDs<\/strong><\/a> from each other, but when we want to work with the actual dataset, at that point action is performed. When the action is triggered after the result, new RDD is not formed like transformation. Thus, Actions are Spark RDD operations that give non-RDD values. The values of action are stored to drivers or to the external storage system. It brings laziness of RDD into motion.<\/p>\n<p>An action is one of the ways of sending data from <em>Executer<\/em> to the <em>driver.<\/em> Executors are agents that are responsible for executing a task. While the driver is a JVM process that coordinates workers and execution of the task. Some of the actions of Spark are:<\/p>\n<h3>4.1. count()<\/h3>\n<p>Action<strong> count()<\/strong> returns the number of elements in RDD.<\/p>\n<p>For example, RDD has values {1, 2, 2, 3, 4, 5, 5, 6} in this RDD \u201crdd.count()\u201d will give the result 8.<\/p>\n<p><strong>Count() example:<\/strong><\/p>\n<p>[php]val data = spark.read.textFile(&#8220;spark_test.txt&#8221;).rdd<br \/>\nval mapFile = data.flatMap(lines =&gt; lines.split(&#8221; &#8220;)).filter(value =&gt; value==&#8221;spark&#8221;)<br \/>\nprintln(mapFile.count())[\/php]<\/p>\n<ul>\n<li><strong><em>Note<\/em> &#8211;<\/strong> In above code<em> flatMap()<\/em> function maps line into words and count the word \u201cSpark\u201d using <em>count()<\/em> Action after filtering lines containing \u201cSpark\u201d from mapFile.<\/li>\n<\/ul>\n<p><strong>Learn: <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-streaming-tutorial\/\">Spark Streaming<\/a><\/strong><\/p>\n<h3>4.2. collect()<\/h3>\n<p>The action<strong>\u00a0collect()<\/strong> is the common and simplest operation that returns our entire RDDs content to driver program. The application of collect() is unit testing where the entire RDD is expected to fit in memory. As a result, it makes easy to compare the result of RDD with the expected result.<br \/>\nAction Collect() had a constraint that all the data should fit in the machine, and copies to the driver.<\/p>\n<p><strong>Collect() example:<\/strong><\/p>\n<p>[php]val data = spark.sparkContext.parallelize(Array((&#8216;A&#8217;,1),(&#8216;b&#8217;,2),(&#8216;c&#8217;,3)))<br \/>\nval data2 =spark.sparkContext.parallelize(Array((&#8216;A&#8217;,4),(&#8216;A&#8217;,6),(&#8216;b&#8217;,7),(&#8216;c&#8217;,3),(&#8216;c&#8217;,8)))<br \/>\nval result = data.join(data2)<br \/>\nprintln(result.collect().mkString(&#8220;,&#8221;))[\/php]<\/p>\n<ul>\n<li><strong><em>Note<\/em> &#8211;<\/strong><em> join()<\/em> transformation in above code will join two RDDs on the basis of same key(alphabet). After that\u00a0<em>collect()<\/em>\u00a0action will return all the elements to the dataset as an Array.<\/li>\n<\/ul>\n<h3>4.3. take(n)<\/h3>\n<p>The action <strong>take(n)<\/strong> returns n number of elements from RDD. It tries to cut the number of partition it accesses, so it represents a biased collection. We cannot presume the order of the elements.<\/p>\n<p>For example, consider RDD {1, 2, 2, 3, 4, 5, 5, 6} in this RDD \u201ctake (4)\u201d will give result { 2, 2, 3, 4}<\/p>\n<p><strong>Take() example:<\/strong><\/p>\n<p>[php]val data = spark.sparkContext.parallelize(Array((&#8216;k&#8217;,5),(&#8216;s&#8217;,3),(&#8216;s&#8217;,4),(&#8216;p&#8217;,7),(&#8216;p&#8217;,5),(&#8216;t&#8217;,8),(&#8216;k&#8217;,6)),3)<\/p>\n<p>val group = data.groupByKey().collect()<\/p>\n<p>val twoRec = result.take(2)<\/p>\n<p>twoRec.foreach(println)[\/php]<\/p>\n<ul>\n<li><em><strong>Note<\/strong><\/em> &#8211; The\u00a0<em>take(2)<\/em> Action will return an array with the first <em>n<\/em> elements of the data set defined in the<em>\u00a0<\/em>taking\u00a0argument.<\/li>\n<\/ul>\n<p class=\"entry-title \"><strong>Learn: <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-dstream-discretized-streams\/\">Apache Spark DStream (Discretized Streams)<\/a><\/strong><\/p>\n<h3>4.4. top()<\/h3>\n<p>If ordering is present in our RDD, then we can extract top elements from our RDD using <strong>top()<\/strong>. Action\u00a0<em>top()<\/em> use default ordering of data.<\/p>\n<p><strong>Top() example:<\/strong><\/p>\n<p>[php]val data = spark.read.textFile(&#8220;spark_test.txt&#8221;).rdd<br \/>\nval mapFile = data.map(line =&gt; (line,line.length))<br \/>\nval res = mapFile.top(3)<br \/>\nres.foreach(println)[\/php]<\/p>\n<ul>\n<li><em><strong>Note<\/strong><\/em> &#8211; <em>map()<\/em> operation will map each line with its length. And top(3) will return 3 records from mapFile with default ordering.<\/li>\n<\/ul>\n<h3>4.5. countByValue()<\/h3>\n<p>The <strong>countByValue()<\/strong> returns, many times each element occur in RDD.<\/p>\n<p>For example, RDD has values {1, 2, 2, 3, 4, 5, 5, 6} in this RDD \u201crdd.countByValue()\u201d\u00a0 will give the result {(1,1), (2,2), (3,1), (4,1), (5,2), (6,1)}<\/p>\n<p><strong>countByValue() example:<\/strong><\/p>\n<p>[php]val data = spark.read.textFile(&#8220;spark_test.txt&#8221;).rdd<br \/>\nval result= data.map(line =&gt; (line,line.length)).countByValue()<br \/>\nresult.foreach(println)[\/php]<\/p>\n<ul>\n<li><strong><em>Note<\/em> &#8211;<\/strong>\u00a0The\u00a0<em>countByValue()<\/em>\u00a0action will return a hashmap of (K, Int) pairs with the count of each key.<\/li>\n<\/ul>\n<p><strong>Learn:<a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-streaming-transformation-operations\/\"> Apache Spark Streaming Transformation Operations<\/a><\/strong><\/p>\n<h3>4.6. reduce()<\/h3>\n<p>The<strong> reduce()<\/strong> function takes the two elements as input from the RDD and then produces the output of the same type as that of the input elements. The simple forms of such function are an addition. We can add the elements of RDD, count the number of words. It accepts commutative and associative operations as an argument.<\/p>\n<p><strong>Reduce() example:<\/strong><\/p>\n<p>[php]val rdd1 = spark.sparkContext.parallelize(List(20,32,45,62,8,5))<br \/>\nval sum = rdd1.reduce(_+_)<br \/>\nprintln(sum)[\/php]<\/p>\n<ul>\n<li><em><strong>Note<\/strong><\/em> &#8211; The\u00a0<em>reduce()<\/em>\u00a0action in above code will add the elements of the source RDD.<\/li>\n<\/ul>\n<h3>4.7. fold()<\/h3>\n<p>The signature of the<strong> fold() <\/strong>is like\u00a0<em>reduce(). <\/em>Besides, it takes \u201czero value\u201d as input, which is used for the initial call on each partition. But, the <strong>condition with zero value<\/strong> is that it should be the <strong>identity element of that operation<\/strong>. The key difference between<em> fold()<\/em> and<em> reduce()<\/em> is that, <em>reduce()<\/em> throws an exception for empty collection, but <em>fold()<\/em> is defined for empty collection.<\/p>\n<p>For example, zero is an identity for addition; one is identity element for multiplication. The return type of <em>fold()<\/em> is same as that of the element of RDD we are operating on.<br \/>\nFor example, rdd.fold(0)((x, y) =&gt; x + y).<\/p>\n<p><strong>Fold() example:<\/strong><\/p>\n<p>[php]val rdd1 = spark.sparkContext.parallelize(List((&#8220;maths&#8221;, 80),(&#8220;science&#8221;, 90)))<br \/>\nval additionalMarks = (&#8220;extra&#8221;, 4)<br \/>\nval sum = rdd1.fold(additionalMarks){ (acc, marks) =&gt; val add = acc._2 + marks._2<br \/>\n(&#8220;total&#8221;, add)<br \/>\n}<br \/>\nprintln(sum)[\/php]<\/p>\n<ul>\n<li><strong><em>Note<\/em> &#8211;<\/strong> In above code <em>additionalMarks<\/em> is an initial value. This value will be added to the int value of each record in the source RDD.<\/li>\n<\/ul>\n<p><strong>Learn: <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-streaming-checkpoint\/\">Spark Streaming Checkpoint in Apache Spark<\/a><\/strong><\/p>\n<h3>4.8. aggregate()<\/h3>\n<p>It gives us the flexibility to get data type different from the input type. The <strong>aggregate()<\/strong> takes two functions to get the final result. Through one function we combine the element from our RDD with the accumulator, and the second, to combine the accumulator. Hence, in aggregate, we supply the initial zero value of the type which we want to return.<\/p>\n<h3>4.9. foreach()<\/h3>\n<p>When we have a situation where we want to apply operation on each element of RDD, but it should not return value to the <em>driver<\/em>. In this case, <strong>foreach()<\/strong> function is useful. For example, inserting a record into the database.<\/p>\n<p><strong>Foreach() example:<\/strong><\/p>\n<p>[php]val data = spark.sparkContext.parallelize(Array((&#8216;k&#8217;,5),(&#8216;s&#8217;,3),(&#8216;s&#8217;,4),(&#8216;p&#8217;,7),(&#8216;p&#8217;,5),(&#8216;t&#8217;,8),(&#8216;k&#8217;,6)),3)<br \/>\nval group = data.groupByKey().collect()<br \/>\ngroup.foreach(println)[\/php]<\/p>\n<ul>\n<li><strong><em>Note<\/em> <em>&#8211;<\/em><\/strong><em>\u00a0The foreach()<\/em>\u00a0action run a function <em>(println)<\/em> on each element of the dataset group.<\/li>\n<\/ul>\n<h2>5. Conclusion<\/h2>\n<p>In conclusion, on applying a transformation to an RDD creates another RDD. As a result of this RDDs are immutable in nature. On the introduction of an action on an RDD, the result gets computed. Thus, this lazy evaluation decreases the overhead of computation and make the system more efficient.<br \/>\nIf you have any query about Spark RDD Operations, So, feel free to share with us. We will be happy to solve them.<br \/>\n<strong>See Also-<\/strong><\/p>\n<ul>\n<li><strong><a href=\"http:\/\/data-flair.training\/blogs\/spark-sql-tutorial\/\">Spark SQL Introduction<\/a><\/strong><\/li>\n<li><strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-sql-dataframe-tutorial\/\">Apache Spark SQL DataFrame<\/a><\/strong><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>1. Spark RDD Operations Two types of Apache Spark\u00a0RDD operations are- Transformations and Actions. A Transformation is a function that produces new RDD from the existing RDDs but when we want to work with&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":42915,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[233,896,946,1907,8126,11342,11348,13021,13022,13026,13099,13104,13139,13142,14897],"class_list":["post-682","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-spark","tag-action","tag-apache-spark","tag-apache-spark-rdds","tag-big-data","tag-learn","tag-rdd-in-apache-spark","tag-rdd-transformation-and-action","tag-spark","tag-spark-scala","tag-spark-api","tag-spark-quickstart","tag-spark-rdd","tag-spark-training","tag-spark-tutorial","tag-transformation"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Spark RDD Operations-Transformation &amp; Action with Example - DataFlair<\/title>\n<meta name=\"description\" content=\"Spark RDD Operations covers what is RDD,how to create RDD in Spark,what is Spark transformation &amp; Spark action,RDD Transformation &amp; Action API with examples\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Spark RDD Operations-Transformation &amp; Action with Example - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Spark RDD Operations covers what is RDD,how to create RDD in Spark,what is Spark transformation &amp; Spark action,RDD Transformation &amp; Action API with examples\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2016-07-26T05:19:08+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-11-20T08:46:40+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Spark-RDD-Operations-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"15 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Spark RDD Operations-Transformation &amp; Action with Example - DataFlair","description":"Spark RDD Operations covers what is RDD,how to create RDD in Spark,what is Spark transformation & Spark action,RDD Transformation & Action API with examples","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/","og_locale":"en_US","og_type":"article","og_title":"Spark RDD Operations-Transformation &amp; Action with Example - DataFlair","og_description":"Spark RDD Operations covers what is RDD,how to create RDD in Spark,what is Spark transformation & Spark action,RDD Transformation & Action API with examples","og_url":"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2016-07-26T05:19:08+00:00","article_modified_time":"2018-11-20T08:46:40+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Spark-RDD-Operations-01.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"15 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"Spark RDD Operations-Transformation &amp; Action with Example","datePublished":"2016-07-26T05:19:08+00:00","dateModified":"2018-11-20T08:46:40+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/"},"wordCount":2941,"commentCount":21,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Spark-RDD-Operations-01.jpg","keywords":["action","apache spark","Apache Spark RDDs","big data","learn","rdd in apache spark","rdd transformation and action","Spark","spark &amp; Scala","Spark API","spark quickstart","spark rdd","spark training","spark tutorial","transformation"],"articleSection":["Apache Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/","url":"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/","name":"Spark RDD Operations-Transformation &amp; Action with Example - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Spark-RDD-Operations-01.jpg","datePublished":"2016-07-26T05:19:08+00:00","dateModified":"2018-11-20T08:46:40+00:00","description":"Spark RDD Operations covers what is RDD,how to create RDD in Spark,what is Spark transformation & Spark action,RDD Transformation & Action API with examples","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Spark-RDD-Operations-01.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Spark-RDD-Operations-01.jpg","width":1200,"height":628,"caption":"Spark RDD Operations-Transformation &amp; Action with Example"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/spark-rdd-operations-transformations-actions\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Apache Spark Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/spark\/"},{"@type":"ListItem","position":3,"name":"Spark RDD Operations-Transformation &amp; Action with Example"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/682","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=682"}],"version-history":[{"count":7,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/682\/revisions"}],"predecessor-version":[{"id":42917,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/682\/revisions\/42917"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/42915"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=682"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=682"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=682"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}