

{"id":2582,"date":"2017-05-13T10:27:50","date_gmt":"2017-05-13T10:27:50","guid":{"rendered":"http:\/\/data-flair.training\/blogs\/?p=2582"},"modified":"2019-01-30T17:01:05","modified_gmt":"2019-01-30T11:31:05","slug":"spark-in-memory-computing","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/","title":{"rendered":"Spark In-Memory Computing &#8211; A Beginners Guide"},"content":{"rendered":"<div class='__iawmlf-post-loop-links' style='display:none;' data-iawmlf-post-links='[{&quot;id&quot;:2354,&quot;href&quot;:&quot;http:\\\/\\\/spark.apache.org&quot;,&quot;archived_href&quot;:&quot;http:\\\/\\\/web-wp.archive.org\\\/web\\\/20251009215151\\\/https:\\\/\\\/spark.apache.org\\\/&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[{&quot;date&quot;:&quot;2025-12-11 04:17:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-14 07:11:19&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-17 07:55:29&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-20 14:34:27&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-23 15:49:42&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-26 15:59:57&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-30 07:08:03&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-02 07:19:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-05 08:37:45&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-08 09:28:47&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-11 11:37:40&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-14 12:46:43&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-17 20:26:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-20 20:31:00&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-24 06:20:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-27 06:26:56&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-30 07:17:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-02 07:26:54&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-05 10:18:07&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-08 12:50:55&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-11 14:05:53&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-14 15:00:31&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-18 00:17:52&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-21 06:52:12&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-24 08:35:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-27 08:54:29&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-02 09:01:11&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-05 09:57:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-08 12:27:51&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-11 12:42:39&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-14 23:54:40&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-18 03:00:10&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-21 06:08:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-24 07:13:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-27 09:23:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-30 11:37:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-02 13:11:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-05 14:53:20&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-08 19:36:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-11 23:42:38&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-15 01:00:01&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-18 06:16:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-21 07:55:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-24 09:26:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-27 11:00:27&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-30 12:57:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-03 13:36:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-06 19:54:59&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-10 07:47:43&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-13 09:22:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-16 16:11:08&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-19 16:22:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-22 17:30:06&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-25 20:07:42&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-05-29 03:42:28&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-01 10:44:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-04 10:50:12&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-07 10:53:31&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-10 11:32:26&quot;,&quot;http_code&quot;:206}],&quot;broken&quot;:false,&quot;last_checked&quot;:{&quot;date&quot;:&quot;2026-06-10 11:32:26&quot;,&quot;http_code&quot;:206},&quot;process&quot;:&quot;done&quot;}]'><\/div>\n<h2>1. Objective<\/h2>\n<p>This tutorial on <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-tutorial-quickstart-introduction\/\"><strong>Apache Spark<\/strong><\/a> in-memory computing will provide you the detailed description of what is in memory computing? Introduction to Spark in-memory processing and how does Apache Spark process data that does not fit into the memory? This tutorial will also cover various storage levels in Spark and benefits of in-memory computation.<\/p>\n<div id=\"attachment_48353\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-In-Memory-Computing.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-48353\" class=\"size-full wp-image-48353\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-In-Memory-Computing.jpg\" alt=\"Spark In-Memory Computing \" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-In-Memory-Computing.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-In-Memory-Computing-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-In-Memory-Computing-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-In-Memory-Computing-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-In-Memory-Computing-1024x536.jpg 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-In-Memory-Computing-520x272.jpg 520w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-48353\" class=\"wp-caption-text\">Spark In-Memory Computing &#8211; A Beginners Guide<\/p><\/div>\n<h2>2. What is Spark In-memory Computing?<\/h2>\n<p>In in-memory computation, the data is kept in <em>random access memory(<strong>RAM<\/strong>)<\/em> instead of some slow disk drives and is processed in parallel. Using this we can detect a pattern, analyze large data. This has become popular because it reduces the cost of memory. So, in-memory processing is economic for applications. The two main columns of in-memory computation are-<\/p>\n<ul>\n<li>RAM storage<\/li>\n<li>Parallel distributed processing.<\/li>\n<\/ul>\n<h2>3. Introduction to Spark In-memory Computing<\/h2>\n<p>Keeping the data in-memory improves the performance by an order of magnitudes. The main abstraction of Spark is its<strong><a href=\"http:\/\/data-flair.training\/blogs\/rdd-in-apache-spark\/\"> RDDs.<\/a><\/strong> And the RDDs are cached using the <strong>cache()<\/strong> or <strong>persist()<\/strong> method.<\/p>\n<p>When we use <em>cache()<\/em> method, all the RDD stores in-memory. When RDD stores the value in memory, the data that does not fit in memory is either recalculated or the excess data is sent to disk. Whenever we want RDD, it can be extracted without going to disk. This reduces the space-time complexity and overhead of disk storage.<br \/>\nThe in-memory capability of Spark is good for<em> machine learning<\/em> and <em>micro-batch processing<\/em>. It provides faster execution for iterative jobs.<\/p>\n<p>When we use <em>persist()<\/em> method the RDDs can also be stored in-memory, we can use it across parallel operations. The difference between cache() and persist() is that using cache() the default storage level is <strong>MEMORY_ONLY<\/strong> while using persist() we can use various storage levels.<\/p>\n<p>Follow this link to <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-rdd-persistence-caching\/\">learn Spark RDD persistence and caching mechanism.<\/a><\/p>\n<h2>4. Storage levels of RDD Persist() in Spark<\/h2>\n<p>The various storage level of persist() method in Apache Spark RDD are:<\/p>\n<ul>\n<li>MEMORY_ONLY<\/li>\n<li>MEMORY_AND_DISK<\/li>\n<li>MEMORY_ONLY_SER<\/li>\n<li>MEMORY_AND_DISK_SER<\/li>\n<li>DISK_ONLY<\/li>\n<li>MEMORY_ONLY_2 and MEMORY_AND_DISK_2<\/li>\n<\/ul>\n<p>Let&#8217;s discuss the above mention Apache Spark storage levels one by one &#8211;<\/p>\n<h3>4.1. MEMORY_ONLY<\/h3>\n<div id=\"attachment_2588\" style=\"width: 812px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-in-memory.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2588\" class=\"wp-image-2588 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-in-memory.jpg\" alt=\"spark-in-memory\" width=\"802\" height=\"420\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-in-memory.jpg 802w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-in-memory-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-in-memory-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-in-memory-768x402.jpg 768w\" sizes=\"auto, (max-width: 802px) 100vw, 802px\" \/><\/a><p id=\"caption-attachment-2588\" class=\"wp-caption-text\">Spark storage level &#8211; memory only<\/p><\/div>\n<p>In this storage level Spark, RDD store as deserialized JAVA object in JVM. If RDD does not fit in memory, then the remaining will recompute each time they are needed.<\/p>\n<h3>4.2. MEMORY_AND_DISK<\/h3>\n<div id=\"attachment_2589\" style=\"width: 812px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-in-memory-and-disk.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2589\" class=\"wp-image-2589 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-in-memory-and-disk.jpg\" alt=\"spark-in-memory-and-disk\" width=\"802\" height=\"420\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-in-memory-and-disk.jpg 802w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-in-memory-and-disk-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-in-memory-and-disk-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-in-memory-and-disk-768x402.jpg 768w\" sizes=\"auto, (max-width: 802px) 100vw, 802px\" \/><\/a><p id=\"caption-attachment-2589\" class=\"wp-caption-text\">Spark storage level-memory and disk<\/p><\/div>\n<p>In this level, RDD is stored as deserialized JAVA object in JVM. If the full RDD does not fit in memory then the remaining partition is stored on disk, instead of recomputing it every time when it is needed.<\/p>\n<h3>4.3. MEMORY_ONLY_SER<\/h3>\n<div id=\"attachment_2590\" style=\"width: 812px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-serialized.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2590\" class=\"wp-image-2590 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-serialized.jpg\" alt=\"spark-inmemory-serialized\" width=\"802\" height=\"420\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-serialized.jpg 802w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-serialized-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-serialized-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-serialized-768x402.jpg 768w\" sizes=\"auto, (max-width: 802px) 100vw, 802px\" \/><\/a><p id=\"caption-attachment-2590\" class=\"wp-caption-text\">Spark storage level &#8211; memory only serialized<\/p><\/div>\n<p>This level stores RDDs as serialized JAVA object. It stores one-byte array per partition. It is like\u00a0<em>MEMORY_ONLY<\/em> but is more space efficient especially when we use fast serializer.<\/p>\n<h3>4.4. MEMORY_AND_DISK_SER<\/h3>\n<div id=\"attachment_2591\" style=\"width: 812px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-and-disk-serialized.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2591\" class=\"wp-image-2591 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-and-disk-serialized.jpg\" alt=\"spark-inmemory-and disk-serialized\" width=\"802\" height=\"420\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-and-disk-serialized.jpg 802w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-and-disk-serialized-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-and-disk-serialized-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-and-disk-serialized-768x402.jpg 768w\" sizes=\"auto, (max-width: 802px) 100vw, 802px\" \/><\/a><p id=\"caption-attachment-2591\" class=\"wp-caption-text\">Spark storage level &#8211; memory and disk serialized<\/p><\/div>\n<p>This level stores RDD as serialized JAVA object. If the full RDD does not fit in the memory then it stores the remaining partition on the disk, instead of recomputing it every time when we need.<\/p>\n<h3>4.5. DISK_ONLY<\/h3>\n<div id=\"attachment_2593\" style=\"width: 812px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-disk-only.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2593\" class=\"wp-image-2593 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-disk-only.jpg\" alt=\"spark-in-memory-disk-only\" width=\"802\" height=\"420\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-disk-only.jpg 802w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-disk-only-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-disk-only-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/spark-inmemory-disk-only-768x402.jpg 768w\" sizes=\"auto, (max-width: 802px) 100vw, 802px\" \/><\/a><p id=\"caption-attachment-2593\" class=\"wp-caption-text\">Spark storage level-disk-only<\/p><\/div>\n<p>This storage level stores the RDD partitions only on disk.<\/p>\n<h3>4.6. MEMORY_ONLY_2 and MEMORY_AND_DISK_2<\/h3>\n<p>It is like\u00a0<em>MEMORY_ONLY<\/em> and MEMORY_AND_DISK. The only difference is that each partition gets replicate on two nodes in the cluster.<\/p>\n<p>Follow this link to <a href=\"http:\/\/data-flair.training\/blogs\/important-apache-spark-terminologies-and-concepts-you-must-know\/\">learn more about Spark terminologies and concepts in detail<\/a>.<\/p>\n<h2>5. Advantages of In-memory Processing<\/h2>\n<p>After studying <strong><a href=\"http:\/\/spark.apache.org\/\">Spark<\/a><\/strong> in-memory computing introduction and various storage levels in detail, let&#8217;s discuss the advantages of in-memory computation-<\/p>\n<ol>\n<li>When we need a data to analyze it is already available on the go or we can retrieve it easily.<\/li>\n<li>It is good for real-time risk management and fraud detection.<\/li>\n<li>The data becomes highly accessible.<\/li>\n<li>The computation speed of the system increases.<\/li>\n<li>Improves complex event processing.<\/li>\n<li>Cached a large amount of data.<\/li>\n<li>It is economic, as the cost of RAM has fallen over a period of time.<\/li>\n<\/ol>\n<h2>6. Conclusion<\/h2>\n<p>In conclusion, Apache Hadoop enables users to store and process huge amounts of data at very low costs. However, it relies on persistent storage to provide fault tolerance and its one-pass computation model makes MapReduce a poor fit for low-latency applications and iterative computations, such as machine learning and graph algorithms.<\/p>\n<p>Hence, Apache Spark solves these <a href=\"http:\/\/data-flair.training\/blogs\/limitations-of-hadoop\/\"><strong>Hadoop drawbacks<\/strong><\/a>\u00a0by generalizing the MapReduce model. It improves the performance and ease of use.<\/p>\n<p>If you like this post or have any query related to Apache Spark In-Memory Computing, so, do let us know by leaving a comment.<\/p>\n<p><strong>See Also &#8211;\u00a0<\/strong><a href=\"http:\/\/data-flair.training\/blogs\/limitations-of-apache-spark-overcome-spark-drawbacks\/\">Limitations Of Apache Spark.<\/a><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. Objective This tutorial on Apache Spark in-memory computing will provide you the detailed description of what is in memory computing? Introduction to Spark in-memory processing and how does Apache Spark process data that&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":48353,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[925,926,6653,6654,11884,13064,13065,13861],"class_list":["post-2582","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-spark","tag-apache-spark-in-memory-computation","tag-apache-spark-in-memory-computing","tag-in-memory-computation-in-spark","tag-in-memory-computing-with-spark","tag-saprk-storage-levels","tag-spark-in-memory-computing","tag-spark-in-memory-processing","tag-storage-levels-in-spark"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Spark In-Memory Computing - A Beginners Guide - DataFlair<\/title>\n<meta name=\"description\" content=\"spark in-memory computing introduction-what is in-memory processing,in-memory computation advantages,spark storage levels,difference between cache &amp; persist\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Spark In-Memory Computing - A Beginners Guide - DataFlair\" \/>\n<meta property=\"og:description\" content=\"spark in-memory computing introduction-what is in-memory processing,in-memory computation advantages,spark storage levels,difference between cache &amp; persist\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-05-13T10:27:50+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-01-30T11:31:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-In-Memory-Computing.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Spark In-Memory Computing - A Beginners Guide - DataFlair","description":"spark in-memory computing introduction-what is in-memory processing,in-memory computation advantages,spark storage levels,difference between cache & persist","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/","og_locale":"en_US","og_type":"article","og_title":"Spark In-Memory Computing - A Beginners Guide - DataFlair","og_description":"spark in-memory computing introduction-what is in-memory processing,in-memory computation advantages,spark storage levels,difference between cache & persist","og_url":"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2017-05-13T10:27:50+00:00","article_modified_time":"2019-01-30T11:31:05+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-In-Memory-Computing.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"Spark In-Memory Computing &#8211; A Beginners Guide","datePublished":"2017-05-13T10:27:50+00:00","dateModified":"2019-01-30T11:31:05+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/"},"wordCount":828,"commentCount":6,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-In-Memory-Computing.jpg","keywords":["Apache spark in memory computation","Apache spark in memory computing","in memory computation in spark","in memory computing with spark","Saprk storage levels","spark in memory computing","spark in memory processing","Storage levels in spark"],"articleSection":["Apache Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/","url":"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/","name":"Spark In-Memory Computing - A Beginners Guide - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-In-Memory-Computing.jpg","datePublished":"2017-05-13T10:27:50+00:00","dateModified":"2019-01-30T11:31:05+00:00","description":"spark in-memory computing introduction-what is in-memory processing,in-memory computation advantages,spark storage levels,difference between cache & persist","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-In-Memory-Computing.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Spark-In-Memory-Computing.jpg","width":1200,"height":628,"caption":"Spark In-Memory Computing - A Beginners Guide"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/spark-in-memory-computing\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Apache Spark Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/spark\/"},{"@type":"ListItem","position":3,"name":"Spark In-Memory Computing &#8211; A Beginners Guide"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2582","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=2582"}],"version-history":[{"count":8,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2582\/revisions"}],"predecessor-version":[{"id":48354,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2582\/revisions\/48354"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/48353"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=2582"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=2582"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=2582"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}