Define in-memory processing in Spark

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 10:00 pm #6429
  
  DataFlair Team
  Spectator
  
  What is meant by in-memory processing in Spark?
- September 20, 2018 at 10:00 pm #6430
  
  DataFlair Team
  Spectator
  
  At first, we will understand, In-Memory Computing.
  The in-memory computation means instead of some slow disk drives, data is kept in random access memory(RAM). Also, processed in parallel. By using this we can detect a pattern and can analyze large data. It helps reduces the cost of memory, therefore it has become popular. Thus, it results in as economic for applications. In-memory computation has two main columns. They are-
  
  1. RAM storage
  2. Parallel distributed processing
  
  Now let’s discuss, Apache SparkIn-memory Computing
  
  Storing data in-memory improves the performance by an order of magnitudes. Spark’s main abstraction is Spark RDD. By using the cache() or persist() method RDDs are cached.
  
  All the RDD are stored in-memory while we use cache() method. The data that does not fit in memory is either recalculated or the excess data is sent to disk when RDD stores the value in memory. RDDs can be extracted, without going to disk whenever we want. This process decreases the space-time complexity and overhead of disk storage.
  
  For machine learning and micro-batch processing, the in-memory capability of Spark is very good. For iterative jobs, it provides faster execution.
  
  RDDs can also be stored in-memory when we use persist() method. Also possible to use it across parallel operations. There is only one difference between cache() and persist() method. While using persist() we can use various storage levels and by using cache() the default storage level is MEMORY_ONLY.
  
  To understand more, check the link: Spark In-Memory Computing – A Beginners Guide
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

Define in-memory processing in Spark

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses