Why Hadoop MapReduce?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 3:58 pm #5694
  
  DataFlair Team
  Spectator
  
  Why MapReduce is needed to process the data?
  Why MapReduce came into existence?
  What is the need of MapReduce in Hadoop?
- September 20, 2018 at 3:58 pm #5695
  
  DataFlair Team
  Spectator
  
  We are aware that all the data is stored in disk drives. The conundrum is that disk seeks times (latency of the disk operation) have not improved at the rate at which transfer rate (disk’s bandwidth) have reduced.
  
  So if a disk access operation comprises of more seeks, it would take a longer time to write or read through datasets than what it would take to stream through it. So for updating a small size of data to database, a traditional RDBMS will work just fine. However, in case of major data updates in database, MapReduce will be more optimized because it uses sort/merge to update complete database in one go.
  
  MapReduce trumps Traditional RDBMS on the following points:
  1) MapReduce is a good fit for problems where there is a need to analyze a complete dataset in batch mode, particularly for ad hoc analysis.
  2) Map Reduce suits application where data is written once and read many times.
  3) Map Reduce is able to process Petabytes of data in a parallel fashion.
- September 20, 2018 at 3:58 pm #5696
  
  DataFlair Team
  Spectator
  
  MapReduce is a programming framework that allows us to perform distributed and parallel processing on large data sets in a distributed environment. In simple terms, MapReduce is a process of taking a list of objects and running some operation over each object in the list(map) to either produce a new list or calculate a single value.
  
  MapReduce enables skilled programmers to write distributed applications without having to worry about the
  underlying distributed computing infrastructure.
  
  MapReduce currently is the only production-ready data processing framework available for Hadoop. MapReduce is Hadoop’s most mature and powerful framework for data processing
  
  For many problems, especially the kinds that you can solve with SQL, Hive, and Pig are excellent tools.
  But for a wider-ranging task such as statistical processing or text extraction, and especially for processing unstructured data, you need to use MapReduce.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

Why Hadoop MapReduce?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses