

{"id":22410,"date":"2018-08-09T08:00:16","date_gmt":"2018-08-09T08:00:16","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=22410"},"modified":"2018-08-09T08:00:16","modified_gmt":"2018-08-09T08:00:16","slug":"avro-serialization","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/avro-serialization\/","title":{"rendered":"Avro Serialization | Serialization In Java &amp; Hadoop"},"content":{"rendered":"<p>Today, we will learn <strong>Avro<\/strong> Serialization in detail. It includes Serialization Encodings in Avro, brief knowledge on Avro Serialization in<strong> Java<\/strong> and also we will\u00a0cover Avro Serialization in Hadoop in detail.<\/p>\n<p>Also, we will see the advantages and disadvantages of<strong> Hadoop<\/strong> over Java Avro Serialization.\u00a0<span style=\"font-weight: 400\">However, there is much more to learn about Avro Serialization in detail. <\/span><\/p>\n<p><span style=\"font-weight: 400\">So, let\u2019s begin with the introduction to Avro Serialization.<\/span><\/p>\n<div id=\"attachment_24643\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/08\/Avro-Serialization-01-1-1.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-24643\" class=\"wp-image-24643 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/08\/Avro-Serialization-01-1-1.jpg\" alt=\"Avro Serialization | Serialization In Java &amp;amp; Hadoop\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/08\/Avro-Serialization-01-1-1.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/08\/Avro-Serialization-01-1-1-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/08\/Avro-Serialization-01-1-1-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/08\/Avro-Serialization-01-1-1-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/08\/Avro-Serialization-01-1-1-1024x536.jpg 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-24643\" class=\"wp-caption-text\">Avro Serialization | Serialization In Java &amp; Hadoop<\/p><\/div>\n<h2><span style=\"font-weight: 400\">What is Avro Serialization?<\/span><\/h2>\n<p><span style=\"font-weight: 400\">In order to transport the data over the network or to store on some persistent storage, we use the process of translating data structures or objects state into binary or textual form, that process is what we call <strong>Serialization in Avro<\/strong>.<\/span><\/p>\n<p><span style=\"font-weight: 400\">However, we need to deserialize the Data again, once the data transport over the network or retrieved from the persistent storage. In other words, Avro Serialization is known as <em>marshaling<\/em> and deserialization in Avro is known as<em> unmarshalling<\/em>.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Moreover, we can say, with its <strong>schema<\/strong> only Avro data serializes. Although, the Files\u00a0which store Avro data\u00a0must also involve the schema for that data in the same file. <\/span><\/p>\n<p><span style=\"font-weight: 400\">However, there is a Remote Procedure Call (RPC) systems which is based on Avro, that must guarantee when the remote recipients of data have a copy of the schema which is used to write that data.<\/span><\/p>\n<p><span style=\"font-weight: 400\">However, when the data is read, the schema which is used to write data is always available, that means Avro data is not tagged with type information, itself. \u00a0We need a schema to parse data.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Generally, the\u00a0way in which both the Avro serialization as well as deserialization proceed is, depth-first, left-to-right traversal of the schema. Especially, by serializing primitive types\u00a0since they encounter.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Encodings in Avro Serialization<\/span><\/h2>\n<p><span style=\"font-weight: 400\">There are two serialization encodings available in Avro.\u00a0One of them is binary encoding and the other one is <em>JSON encoding<\/em>. Since Binary coding is smaller and faster, most applications use the binary encoding. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Although, sometimes the JSON encoding is appropriate for debugging as well as web-based applications. So, let\u2019s learn both the encodings in detail:<\/span><\/p>\n<div id=\"attachment_23006\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Encodings-in-Avro-Serialization-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-23006\" class=\"wp-image-23006 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Encodings-in-Avro-Serialization-01.jpg\" alt=\"Avro Serialization\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Encodings-in-Avro-Serialization-01.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Encodings-in-Avro-Serialization-01-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Encodings-in-Avro-Serialization-01-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Encodings-in-Avro-Serialization-01-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Encodings-in-Avro-Serialization-01-1024x536.jpg 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-23006\" class=\"wp-caption-text\">Encodings in Avro Serialization<\/p><\/div>\n<h3><span style=\"font-weight: 400\">a. Binary Encoding in Avro<\/span><\/h3>\n<p><strong>Primitive Types<\/strong><\/p>\n<p><span style=\"font-weight: 400\">In binary, Primitive types are encoded as:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">As zero bytes, null is written.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">And, as a single byte, boolean is written whose value is either 0 (false) or 1 (true).<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Further, by using variable-length zig-zag coding, int and long values are written.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Example of Binary Encoding in Avro Serialization:<\/span><\/p>\n<table>\n<tbody>\n<tr>\n<td><strong>value<\/strong><\/td>\n<td><strong>hax<\/strong><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400\">0<\/span><\/td>\n<td><span style=\"font-weight: 400\">00<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400\">-1<\/span><\/td>\n<td><span style=\"font-weight: 400\">01<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400\">1<\/span><\/td>\n<td><span style=\"font-weight: 400\">02<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400\">-2<\/span><\/td>\n<td><span style=\"font-weight: 400\">03<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400\">2<\/span><\/td>\n<td><span style=\"font-weight: 400\">04<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400\">-64<\/span><\/td>\n<td><span style=\"font-weight: 400\">7f<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400\">64<\/span><\/td>\n<td><span style=\"font-weight: 400\">80 01<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ul>\n<li>Make sure, as 4 bytes, we write a float. Further, by using a method equivalent to Java&#8217;s floatToIntBits, the float converts into a 32-bit integer and then encode it in little-endian format.<\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Further, as 8 bytes, we write a double. Here also, using a method equivalent to Java&#8217;s doubleToLongBits, the double converts into a 64-bit integer and then encoded in little-endian format.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">However, as a long followed by that many bytes of data, bytes encode.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">And, as a long followed by that many bytes of UTF-8 encoded character data, a string is encoded.<\/span><\/li>\n<\/ul>\n<h3><span style=\"font-weight: 400\">b. JSON Encoding in Avro<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Basically, the JSON encoding in Avro Serialization is the same as we use to encode field default values, except for unions.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In JSON, the value of a union is encoded as:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">It is encoded as a JSON null if its type is null.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Else, with one name\/value pair, it is encoded as a JSON object where the name is the type&#8217;s name and the value is the recursively encoded value. Make sure, the user-specified name is used, for Avro&#8217;s named types (record, fixed or enum), and, the type name is used, for other types.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Now, let&#8217;s understand it with an example, here the union schema [&#8220;null&#8221;,&#8221;string&#8221;,&#8221;Foo&#8221;], where Foo is a record name, and that would encode as:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Null as null;<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The string &#8220;a&#8221; as {&#8220;string&#8221;: &#8220;a&#8221;}; and<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">A Foo instance as {&#8220;Foo&#8221;: {&#8230;}}. Here {&#8230;} refer to JSON encoding of a Foo instance.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Although make sure, to correctly process JSON-encoded data, we need a schema. As an example, the JSON encoding does not consider any difference between records and maps, int and long or float and double and many more.<\/span><\/p>\n<h4><strong>i. Single-object encoding<\/strong><\/h4>\n<p><span style=\"font-weight: 400\">As there is the time when we need to store a single Avro serialized object for a longer period of time. However, a very common example for this is to store Avro records for\u00a0various weeks in an Apache Kafka topic.<\/span><\/p>\n<h4>ii. Single object encoding specification<\/h4>\n<p><span style=\"font-weight: 400\">We encode this as &#8211;\u00a0<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">To show that the message is Avro, a two-byte marker, C3 01, and also it uses this single-record format (version 1).<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">The 8-byte little-endian CRC-64-AVRO fingerprint of the object&#8217;s schema.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">By using Avro&#8217;s binary encoding, the Avro object encoded.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">In addition, to determine whether a payload is Avro, Implementations use the 2-byte marker. However, when the message doesn&#8217;t encod Avro payload, this check helps avoid expensive lookups which resolve the schema from a fingerprint.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Avro Serialization in Java<\/span><\/h2>\n<p><span style=\"font-weight: 400\">There is a mechanism in Java which is known as object serialization. We can represent an object as a byte sequence that includes the object&#8217;s data as well as information about the object&#8217;s type and the types of data stored in the object.<\/span><\/p>\n<p><span style=\"font-weight: 400\">It is possible to deserialize after serialization, once a serialized object is written into a file, we can read it and read it from the file and deserialized. In addition, to serialize and deserialize an object, ObjectInputStream and ObjectOutputStream class uses respectively in Java.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Avro Serialization in Hadoop<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Especially for Interprocess Communication and Persistent Storage, in distributed systems like Hadoop, the concept of serialization is used.<\/span><\/p>\n<div id=\"attachment_23004\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Avro-Serialization-in-Hadoop-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-23004\" class=\"wp-image-23004 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Avro-Serialization-in-Hadoop-01.jpg\" alt=\"Avro Serialization\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Avro-Serialization-in-Hadoop-01.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Avro-Serialization-in-Hadoop-01-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Avro-Serialization-in-Hadoop-01-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Avro-Serialization-in-Hadoop-01-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Avro-Serialization-in-Hadoop-01-1024x536.jpg 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-23004\" class=\"wp-caption-text\">Avro Serialization in Hadoop<\/p><\/div>\n<h3><span style=\"font-weight: 400\">a. Interprocess Communication<\/span><\/h3>\n<ol>\n<li><span style=\"font-weight: 400\"> Basically, RPC technique was used, to establish the interprocess communication between the nodes connected in a network.<\/span><\/li>\n<li><span style=\"font-weight: 400\"> In order to convert the message into the binary format before sending it to the remote node via the network, RPC uses internal serialization. Further, the remote system deserializes the binary stream into the original message, \u00a0at the other end.<\/span><\/li>\n<li><span style=\"font-weight: 400\"> We need to follow the RPC serialization format \u2212<\/span><\/li>\n<\/ol>\n<ul>\n<li style=\"font-weight: 400\"><strong>Compact<\/strong><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">In order to use network bandwidth efficiently, which is the most scarce resource in a data center.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><strong>Fast<\/strong><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">The serialization and deserialization process should be quick with less overhead since the communication between the nodes is crucial in distributed systems.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><strong>Extensible<\/strong><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">It should be straightforward to evolve the protocol in a controlled manner for clients and servers because Protocols change over time to meet new requirements.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><strong>Interoperable<\/strong><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">The nodes that we write in different languages, must support by the message format.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">b. Persistent Storage<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Whereas, a digital storage which does not lose its data if any loss of power supply happens, that storage system is we call Persistent Storage. Its<em> examples<\/em> could be Magnetic disks and Hard Disk Drives.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">The Writable Interface<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Basically, the Writable Interface in Avro Serialization in\u00a0Hadoop offers two methods for serialization as well as deserialization in Hadoop. The methods are:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><strong>void readFields(DataInput in)<\/strong><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">To deserialize the fields of the given object, we use this method.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><strong>void write(DataOutput out)<\/strong><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Whereas, to serialize the fields of the given object we use this method.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">WritableComparable Interface<\/span><\/h2>\n<p><span style=\"font-weight: 400\">The WritableComparable Interface in Avro Serialization is the combination of two interfaces, one is Writable and the other one is Comparable Interfaces. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Basically, this interface inherits the Comparable interface of Java and Writable interface of Hadoop. Hence it offers methods for data serialization, deserialization, and comparison as well.<\/span><\/p>\n<p><span style=\"font-weight: 400\">So, the method is:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><strong>int compareTo(class obj)<\/strong><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">The int compareTo(class obj) method, compares the current object with the given object obj.<\/span><br \/>\n<span style=\"font-weight: 400\">Also, there is the number of wrapper classes which implement the WritableComparable interface in Hadoop. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Here each class wraps a Java primitive type. Now, we can see the Hadoop serialization class hierarchy in the following figure \u2212<\/span><\/p>\n<div id=\"attachment_22419\" style=\"width: 1063px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Hadoop-Serialization-Hierarchy.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-22419\" class=\"wp-image-22419 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Hadoop-Serialization-Hierarchy.png\" alt=\"Avro Serialization\" width=\"1053\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Hadoop-Serialization-Hierarchy.png 1053w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Hadoop-Serialization-Hierarchy-150x89.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Hadoop-Serialization-Hierarchy-300x179.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Hadoop-Serialization-Hierarchy-768x458.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Hadoop-Serialization-Hierarchy-1024x611.png 1024w\" sizes=\"auto, (max-width: 1053px) 100vw, 1053px\" \/><\/a><p id=\"caption-attachment-22419\" class=\"wp-caption-text\">Avro Serialization &#8211; WritableComparable Interface<\/p><\/div>\n<p><span style=\"font-weight: 400\">Hence, to serialize various types of data in Hadoop, these classes are useful.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">IntWritable Class<\/span><\/h2>\n<p><span style=\"font-weight: 400\">The IntWritable Class in Avro serialization implements Writable, Comparable, as well as WritableComparable interfaces. Basically, it wraps an integer data type in it. Also, to serialize and deserialize integer type of data, this class offers some methods:<\/span><br \/>\n<strong>a. Constructors<\/strong><br \/>\n<span style=\"font-weight: 400\">IntWritable()<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">IntWritable( int value)<\/span><\/li>\n<\/ul>\n<p><strong>b. Methods<\/strong><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">int get()<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">We can get the integer value present in the current object, by using this class.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">void readFields(DataInput in)<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">In order to deserialize the data in the given DataInput object, we use this method.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">void set(int value)<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Moreover, to set the value of the current IntWritable object, we use this method.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">void write(DataOutput out)<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Whereas, \u00a0to serialize the data in the current object to the given DataOutput object, we use this method.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Serializing the Data in Hadoop<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Now to serialize the integer type of data in Hadoop, the procedure is:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">At very first, Instantiate the IntWritable class by wrapping an integer value in it.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">After that, instantiate ByteArrayOutputStream class.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Further, do instantiate DataOutputStream class and then pass the object of ByteArrayOutputStream class to it.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Further, Serialize the integer value in the IntWritable object, by using the write() method. Also, make sure we need an object of DataOutputStream class while using this method.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Ultimately, the data which we call serialize will store in the byte array object and further that data will pass as a parameter to the DataOutputStream class, at the time of instantiation. <\/span><\/li>\n<\/ul>\n<h2><span style=\"font-weight: 400\">Deserializing the Data in Hadoop<\/span><\/h2>\n<p><span style=\"font-weight: 400\">After serialization, the process to deserialize the integer type of data is :<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">For deserialization, Instantiate IntWritable class by wrapping an integer value in it, at first.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Then do instantiate ByteArrayOutputStream class.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Furthermore, do Instantiate DataOutputStream class and also pass the object of ByteArrayOutputStream class to it.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Afterward, by using readFields() method of IntWritable class, Deserialize the data in the object of DataInputStream.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">In this way, the deserialized data will store in the object of IntWritable class. And, using get() method of this class, we can retrieve this data.<\/span><\/li>\n<\/ul>\n<h2><span style=\"font-weight: 400\">Advantage of Hadoop Over Java Serialization<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Basically, by reusing the Writable objects, Hadoop\u2019s Writable-based serialization is capable of reducing the object-creation overhead. And, the Java\u2019s native serialization framework cannot do this.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Disadvantages of Hadoop Serialization<\/span><\/h2>\n<p><span style=\"font-weight: 400\">There are two ways to serialize Hadoop data:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">One is provided by Hadoop\u2019s native library, that is the Writable classes, we can use it.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">And, the other one is Sequence Files which store the data in binary format, we can also use this one.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">However, one main disadvantage of using these two mechanisms,i.e. both Writables and SequenceFiles have only a<strong> Java API<\/strong>, that says we can not write or read it in any other language. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Hence that makes Hadoop a limited box. Hence, we can say Doug Cutting created Avro which is a language-independent data structure just to address this drawback.<\/span><\/p>\n<p>So, this was all in Apache Avro Serialization. Hope you like our explanation.<\/p>\n<h2><span style=\"font-weight: 400\">Conclusion: Avro Serialization<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Hence, we have seen the concept of Avro Serialization in detail. In this Avro Serialization Tutorial, we look at serialization in Java, Serialization in Hadoop, encoding in Avro Serialization. Moreover, we discussed the advantages and disadvantages of Hadoop Serialization. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Also, we saw Writable interface and inwritable class in Avro Serialization. Furthermore, if any doubt occurs regarding Serialization In Apache Avro, feel free to ask in the comment section. We are happy to help.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today, we will learn Avro Serialization in detail. It includes Serialization Encodings in Avro, brief knowledge on Avro Serialization in Java and also we will\u00a0cover Avro Serialization in Hadoop in detail. Also, we will&#46;&#46;&#46;<\/p>\n","protected":false},"author":7,"featured_media":24643,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12],"tags":[283,742,1316,3783,5328,7163,12738,12739,12740,12744,12746,14696,15604,16291],"class_list":["post-22410","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-avro","tag-advantage-of-hadoop-serialization","tag-apache-avro-serialization","tag-avro-serialization","tag-deserializing-the-data-in-hadoop","tag-hadoop-serialization","tag-intwritable-class","tag-serialization-in-avro","tag-serialization-in-hadoop","tag-serialization-in-java","tag-serializing-avro","tag-serializing-the-data-in-hadoop","tag-the-writable-interface","tag-what-is-avro-serialization","tag-writablecomparable-interface"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Avro Serialization | Serialization In Java &amp; Hadoop - DataFlair<\/title>\n<meta name=\"description\" content=\"Avro Serialization,serialization in Java,Serialization in Hadoop,what is Serialization,Hadoop Serialization advantage disadvantage,Apache Avro Serialization\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/avro-serialization\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Avro Serialization | Serialization In Java &amp; Hadoop - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Avro Serialization,serialization in Java,Serialization in Hadoop,what is Serialization,Hadoop Serialization advantage disadvantage,Apache Avro Serialization\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/avro-serialization\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-08-09T08:00:16+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/08\/Avro-Serialization-01-1-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Avro Serialization | Serialization In Java &amp; Hadoop - DataFlair","description":"Avro Serialization,serialization in Java,Serialization in Hadoop,what is Serialization,Hadoop Serialization advantage disadvantage,Apache Avro Serialization","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/avro-serialization\/","og_locale":"en_US","og_type":"article","og_title":"Avro Serialization | Serialization In Java &amp; Hadoop - DataFlair","og_description":"Avro Serialization,serialization in Java,Serialization in Hadoop,what is Serialization,Hadoop Serialization advantage disadvantage,Apache Avro Serialization","og_url":"https:\/\/data-flair.training\/blogs\/avro-serialization\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2018-08-09T08:00:16+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/08\/Avro-Serialization-01-1-1.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/avro-serialization\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/avro-serialization\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd"},"headline":"Avro Serialization | Serialization In Java &amp; Hadoop","datePublished":"2018-08-09T08:00:16+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/avro-serialization\/"},"wordCount":1911,"commentCount":0,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/avro-serialization\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/08\/Avro-Serialization-01-1-1.jpg","keywords":["advantage of Hadoop Serialization","Apache Avro Serialization","Avro Serialization","Deserializing the Data in Hadoop","Hadoop Serialization","IntWritable Class","Serialization in Avro","Serialization in hadoop","Serialization in Java","serializing Avro","Serializing the Data in Hadoop","The Writable Interface","What is Avro Serialization?","WritableComparable Interface"],"articleSection":["AVRO Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/avro-serialization\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/avro-serialization\/","url":"https:\/\/data-flair.training\/blogs\/avro-serialization\/","name":"Avro Serialization | Serialization In Java &amp; Hadoop - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/avro-serialization\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/avro-serialization\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/08\/Avro-Serialization-01-1-1.jpg","datePublished":"2018-08-09T08:00:16+00:00","description":"Avro Serialization,serialization in Java,Serialization in Hadoop,what is Serialization,Hadoop Serialization advantage disadvantage,Apache Avro Serialization","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/avro-serialization\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/avro-serialization\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/avro-serialization\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/08\/Avro-Serialization-01-1-1.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/08\/Avro-Serialization-01-1-1.jpg","width":1200,"height":628,"caption":"Avro Serialization | Serialization In Java &amp; Hadoop"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/avro-serialization\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"AVRO Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/avro\/"},{"@type":"ListItem","position":3,"name":"Avro Serialization | Serialization In Java &amp; Hadoop"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"DataFlair Team specializes in creating clear, actionable content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Backed by industry expertise, we make learning easy and career-oriented for beginners and pros alike.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam3\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/22410","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=22410"}],"version-history":[{"count":0,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/22410\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/24643"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=22410"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=22410"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=22410"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}