Define Writable data types in Hadoop MapReduce.

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Define Writable data types in Hadoop MapReduce.

Viewing 2 reply threads
  • Author
    Posts
    • #6198
      DataFlair TeamDataFlair Team
      Spectator

      Define Writable data types in Hadoop?

    • #6199
      DataFlair TeamDataFlair Team
      Spectator

      Writable data types are meant for writing the data to the local disk and it is a serialization format. Just like in Java there are data types to store variables (int, float, long, double,etc.), Hadoop has its own equivalent data types called Writable data types. These Writable data types are passed as parameters (input and output key-value pairs) for the mapper and reducer.

      The Writable data types discussed below implements WritableComparableinterface. Comparable interface is used for comparing when the reducer sorts the keys, and Writable can write the result to the local disk. It does not use the java Serializable because java Serializable is too big or too heavy for hadoop, Writable can serializable the hadoop Object in a very light way. WritableComparable is a combination of Writable and Comparableinterfaces.

      Below is the list of few data types in Java along with the equivalent Hadoop variant:

      1. Integer –> IntWritable: It is the Hadoop variant of Integer. It is used to pass integer numbers as key or value.
      2. Float –> FloatWritable: Hadoop variant of Float used to pass floating point numbers as key or value.
      3. Long –> LongWritable: Hadoop variant of Long data type to store long values.
      4. Short –> ShortWritable: Hadoop variant of Short data type to store short values.
      5. Double –> DoubleWritable: Hadoop variant of Double to store double values.
      6. String –> Text: Hadoop variant of String to pass string characters as key or value.
      7. Byte –> ByteWritable: Hadoop variant of byte to store sequence of bytes.
      8. null –> NullWritable: Hadoop variant of null to pass null as a key or value. Usually NullWritable is used as data type for output key of the reducer, when the output key is not important in the final result.

      Example of a Mapper class implementing few above data types:

      public static class wordMapper extends Mapper<LongWritable, Text, Text, IntWritable>

      Here the first two data types are input key and value to the map function, will be of long values and string characters respectively. The second two data types are intermediate output key and value from the map function will be string characters and int numbers respectively.

      Example of a Reducer class:

      public static class wordReducer extends Reducer<Text, IntWritable, NullWritable, Text>

      Here the first two data types are input key and value to the reduce function, must match the intermediate key and value from the mapper. The second two data types are output key and value from the reduce function, which will be the final result of the MapReduce program.

      Apart from this, we can also write a custom Writable by overriding the writeand readFields methods.

    • #6202
      DataFlair TeamDataFlair Team
      Spectator

      Writable is an interface in mean in Hadoop and types in Hadoop must implement this interface. Hadoop provides these writable wrappers for almost all Java primitive types and some other types,but sometimes we need to pass custom objects and these custom objects should implement Hadoop’s Writable interface. MapReduce uses implementations of Writables for interacting with user-provided Mappers and Reducers.

      To implement the Writable interface we require two methods:

      public interface Writable {
      void readFields(DataInput in);
      void write(DataOutput out);
      }

      We use Hadoop Writable(s) because the data needs to be transmitted between different nodes in a distributed computing environment. This requires serialization and deserialization of data to convert the data that is in structured format to byte stream and vice-versa. Hadoop therefore uses simple and efficient serialization protocol to serialize data between map and reduce phase and these are called Writable(s).
      Some of the examples of writables as already mentioned before are IntWritable, LongWritable, BooleanWritable and FloatWritable.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.