Why Hadoop MapReduce?

Viewing 2 reply threads
  • Author
    Posts
    • #5694
      DataFlair TeamDataFlair Team
      Spectator

      Why MapReduce is needed to process the data?
      Why MapReduce came into existence?
      What is the need of MapReduce in Hadoop?

    • #5695
      DataFlair TeamDataFlair Team
      Spectator

      We are aware that all the data is stored in disk drives. The conundrum is that disk seeks times (latency of the disk operation) have not improved at the rate at which transfer rate (disk’s bandwidth) have reduced.

      So if a disk access operation comprises of more seeks, it would take a longer time to write or read through datasets than what it would take to stream through it. So for updating a small size of data to database, a traditional RDBMS will work just fine. However, in case of major data updates in database, MapReduce will be more optimized because it uses sort/merge to update complete database in one go.

      MapReduce trumps Traditional RDBMS on the following points:
      1) MapReduce is a good fit for problems where there is a need to analyze a complete dataset in batch mode, particularly for ad hoc analysis.
      2) Map Reduce suits application where data is written once and read many times.
      3) Map Reduce is able to process Petabytes of data in a parallel fashion.

    • #5696
      DataFlair TeamDataFlair Team
      Spectator

      MapReduce is a programming framework that allows us to perform distributed and parallel processing on large data sets in a distributed environment. In simple terms, MapReduce is a process of taking a list of objects and running some operation over each object in the list(map) to either produce a new list or calculate a single value.

      MapReduce enables skilled programmers to write distributed applications without having to worry about the
      underlying distributed computing infrastructure.

      MapReduce currently is the only production-ready data processing framework available for Hadoop. MapReduce is Hadoop’s most mature and powerful framework for data processing

      For many problems, especially the kinds that you can solve with SQL, Hive, and Pig are excellent tools.
      But for a wider-ranging task such as statistical processing or text extraction, and especially for processing unstructured data, you need to use MapReduce.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.