Avro Schema – Schema Declaration & Schema Resolution

1. Avro Schema – Objective

Today, in this Apache Avro Tutorial, we will see Avro Schema. Moreover, in this Avro Schema, we will discuss the Schema declaration and Schema resolution. Also, we will learn how to create Avro Schema using JSON and data types in Schema i.e. primitive data types and complex data types in Apache Avro Schema. Along with this, we will understand Schemas in Apache Avro with Avro Schema Example.

So, let’s start Apache Avro Schema.

Apache Avro Schema

Avro Schema – Schema Declaration & Schema Resolution

Explore top features of Avro

2. What is Avro Schema?

Basically, on schemas, only Avro relies on. The schema used, when Avro data is read, and when writing it is always present. By making serialization both fast and small, it allows each datum to be written with no-par-value overheads.
Avro schema is stored with Avro data when it is stored in a file, hence that files may be later processed by any program. However, there could be an error occur, that the program reading the data expects a different schema. So, we can easily resolve because both schemas are present.

Let’s revise the uses of Avro

The client and server in Avro exchange schemas in the connection handshake, while Avro is used in RPC. If somehow the client and the server both have the correspondence between same-named fields, other’s full schema, extra fields, missing fields, etc. we can easily resolve it.
Since Avro schemas are defined with JSON. Due to this reason, in all the languages that already have JSON libraries, it facilitates implementation.
However, Avro accepts schemas as input. Avro follows its own standards of defining schemas, instead of variously available schemas. These schemas consist of various details −

  1. Type of file (record by default)
  2. Location of record
  3. Name of the record
  4. Fields in the record with their corresponding data types

We can store serialized values in binary format using less space, with the help of these schemas. Also, note that even without any metadata, these values stores.

Do you know about Avro SerDe

Get the most demanding skills of IT Industry - Learn Hadoop

3. Schema Declaration/Creating Avro Schemas Using JSON

In a lightweight text-based data interchange format, JavaScript Object Notation (JSON), the Avro schema is created. It is possible to create Avro Schema using JSON in one of the several ways −

Avro Schema

Schema Declaration/Creating Avro Schemas Using JSON

  1. A JSON string
  2. JSON object
  3. A JSON array

Let’s discuss Avro Serialization and Deserialization

a. Avro Schema Example

Now, within “DataFlair” namespace, the given schema defines a (record type) document. Here, Document’s name is “Student” which consists of two “Fields” → Name and Age.

{
  "type" : "record",
  "namespace" : "DataFlair",
  "name" : "Student",
  "fields" : [
     { "name" : "Name" , "type" : "string" },
     { "name" : "Age" , "type" : "int" }
  ]
}

As we can see there are four attributes in this schema, such as −

  • type

It describes document type, in this case, a “record”.

  • namespace −

It illustrates the name of the namespace, where the object resides.

  • name

It explains the schema name.

  • fields

It is an attribute array which consists of:

You must read about Avro SASL Profile

  • name

It describes the name of the field

  • Type

It describes data type of field

b. Primitive Types

Here is the list of primitive type names in Apache Avro Schema:

  1. null: no value.
  2. boolean: a binary value.
  3. int: 32-bit signed integer.
  4. long: 64-bit signed integer.
  5. float: single precision (32-bit) IEEE 754 floating-point number.
  6. double: double precision (64-bit) IEEE 754 floating-point number.
  7. bytes: the sequence of 8-bit unsigned bytes.
  8. string: Unicode character sequence.
  9. Primitive types have no specified attributes.

c. Complex Types

There are six kinds of complex types in Apache Avro Schema, such as:

i. Avro Schema Records

Basically, it uses the type name “record” and does support various attributes, such as:

Do you know about Avro reference API

  • name

It is a JSON string which describes the name of the record (required).

  • namespace

This is a JSON string which qualifies the name;

  • Doc

It is a JSON string which provides documentation to the user of this schema (optional).

  • aliases

This is a JSON array of strings, which describes the alternate names for this record (optional).

  • fields

It is a JSON array, listing fields (required). 

ii. Avro Schema Enums

It uses  the type name “enum”  also do supports various attributes:

Improve your Avro Performance Quickly

  • name

It is a JSON string which provides the name of the enum (required).

  • namespace,

This is a JSON string which qualifies the name;

  • aliases

It is a JSON array of strings, which provides the alternate names for this enum (optional).

  • doc

This is a JSON string which provides documentation to the user of this schema (optional).

  • symbols

It is a JSON array, listing symbols, as JSON strings (required). Make sure, all the symbols in an enum must be Arrays.

iii. Arrays in Avro Schema

It uses the type name “array” and supports only one attribute:

  • items

It is simply the schema of the array’s items.

iv. Avro Schema Maps

It uses the type name “map” and does support only one attribute:

  • values

It is the schema of the map’s values. Make sure, Map keys are assumed to be strings.

v. Unions in Avro Schema

By using JSON arrays, Unions are represented. Make sure, Unions may not contain other unions, immediately.

vi. Fixed Avro Schema

It uses the type name “fixed” and does support following attributes:

  • name

It is a string naming this fixed (required).

  • namespace

This is a string which qualifies the name.

Hadoop Quiz

4. Schema Resolution

At times, this is possible that the schema which is present may not be exactly the schema what was expected. Let’s understand Avro Schema Resolution in this way, the versions of both read and write of data are different, then records may have had fields added or removed.
Basically, the schema which we use to write the data is what we call the writer’s schema, or the schema what the application expects is the reader’s schema. So, the differences between both must be resolved as follows:

Preparation for Apache Avro Interview

A. If the two schemas do not match, this is an error.
So, in order to match, one of the following statement must hold:

  1. It is must that both schemas are arrays, whose item types match.
  2. When both schemas are maps their value types must match.
  3. It is must that both the schemas are enums and their names match.
  4. Moreover, both schemas which are present their sizes and names must match.
  5. Both the schemas are records but with the same name
  6. It is possible either schema is a union.
  7. It is the must that both schemas have the same primitive type.
  8. Now, make sure that the writer’s schema may promote to the reader’s as follows:
  • Int is promotable to long, float, or double.
  • Long is promotable to float or double.
  • The float is promotable to double.

B. If both are records

  1. It is possible that the that the fields are matched by name and order of fields may be different.
  2. Make sure the schemas for fields with the same name in both records resolve recursively.
  3. The writer’s value for that field is ignored if the writer’s record contains a field with a name not present in the reader’s record.
  4. The reader should use the default value from its field, if the reader’s record schema has a field that contains a default value, and writer’s schema does not have a field with the same name.
  5. An error occurs, if somehow the reader’s record schema has a field with no default value, and the writer’s schema does not have a field with the same name.

Latest Avro Quiz – check your Avro Performance

C. If both are enums
An error occurs if the writer’s symbol is not present in the reader’s enum.
D. If both are arrays
Moreover, there is a case when both are arrays, then this resolution algorithm must apply recursively to the reader’s and writer’s array item schemas.
E. If both are maps
Further, if both are maps, then this resolution algorithm is applied recursively to the reader’s and writer’s value schemas.
F. If in any case the reader’s is a union, but the writer’s one is not, then the in the reader’s union, the first schema, that matches the writer’s schema is recursively resolved against it. Here also, an error occurs, if none match.
G. When a case occurs when a writer is a union, but the reader’s one is not, and if the reader’s schema matches the selected writer’s schema, it is recursively resolved against it. An error occurs if they do not match.
So, this was all in Apache Avro Schema. Hope you like our explanation.

5. Conclusion: Avro Schema

Hence, in this Avro Schema tutorial, we have learned the whole about Apache Avro Schemas in detail which also includes Schema Declaration & Avro Schema Resolution to understand well. Also, we saw Avro Schema example and creating Avro Schema from JSON. Still, if any doubt, ask in the comment tab.
For reference

Leave a Reply

Your email address will not be published. Required fields are marked *