Flume Event Serializers – Apache Flume
Apache Flume is a distributed system for transferring data from external sources to the HDFS or HBase. The event serializer is a mechanism that is used for converting a flume event into another format for the output.
In this article, you will explore what event serializer is. The article covers different EventSerializers that ship with Apache Flume. You will see different event serializers along with their configuration options and examples.
Introduction to Event Serializers
Event serializers is a mechanism which is used for converting a flume event into another format for the output. The Event serializer is an interface that allows for the random serialization of an event. The hdfs sink and the file_roll sink both support the EventSerializer interface.
Event Serializers are similar in function to the Layout class in log4j. There are different types of Flume event serializers. The text serializer outputs the flume event body. The avro_event serializer is used for creating an Avro representation of the event.
Let us now explore the EventSerializers that ship with Flume in detail.
1. Body Text Serializer
It is the default serializer that writes the body of the flume event without any modification or transformation to the output stream. It ignores the event headers. If the headers exist on the flume event, then they will be discarded
The Configuration options for Body Text serializer are as follows:
Property Name | Default Values | Description |
appendNewline | true | It specifies whether a newline will be appended to each flume event at write time. The default value is true which assumes that flume events do not contain newlines, for legacy reasons. |
Example for agent named agent1, sink name sk1, and channel ch1:
agent1.sinks = sk1 agent1.sinks.sk1.type = file_roll agent1.sinks.sk1.channel = ch1 agent1.sinks.sk1.sink.directory = /var/log/flume agent1.sinks.sk1.sink.serializer = text agent1.sinks.sk1.sink.serializer.appendNewline = false
2. “Flume Event” Avro Event Serializer
Alias: avro_event
This Event Serializer serializes the Flume events into an Avro container file. It uses the same schema which is used for the Flume events in the Avro RPC mechanism. This event serializer inherits from the AbstractAvroEventSerializer class.
The Configuration options for “Flume Event” Avro Event Serializer are as follows:
Property Name | Default Values | Description |
syncIntervalBytes | 2048000 | It specifies Avro sync interval, in approximate bytes. |
compressionCodec | null | It specifies the Avro compression codec. For the supported codecs, go through the Avro’s CodecFactory docs. |
Example for agent named agent1, sink name sk1, and channel ch1:
agent1.sinks.sk1.type = hdfs agent1.sinks.sk1.channel = ch1 agent1.sinks.sk1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S agent1.sinks.sk1.serializer = avro_event agent1.sinks.sk1.serializer.compressionCodec = snappy
3. Avro Event Serializer
Alias
It doesn’t have an alias. It must be specified using the qualified class name.
This event serializer also serializes Flume events into an Avro container file just like the “Flume Event” Avro Event Serializer. However, the record schema is configurable and can be specified either as an Apache Flume configuration property or can be passed in a flume event header.
For passing the record schema as a part of the Apache Flume configuration property, use the property schemaURL.
For passing the record schema in the flume event header you can choose any one of the following ways:
- Either specify the event header flume.avro.schema.literal containing the JSON-format representation of the schema
- Specify the flume.avro.schema.url with a URL where the schema may be found.
This event serializer inherits from the AbstractAvroEventSerializer class.
The Configuration options for Avro Event Serializer are as follows:
Property Name | Default Values | Description |
syncIntervalBytes | 2048000 | It specifies Avro sync interval, in approximate bytes. |
compressionCodec | null | It specifies the Avro compression codec. For the supported codecs, go through the Avro’s CodecFactory docs. |
schemaURL | null | It specifies the Avro schema URL. The Schemas specified in the header overrides this option. |
Example for agent named agent1, sink name sk1, and channel ch1:
agent1.sinks.sk1.type = hdfs agent1.sinks.sk1.channel = ch1 agent1.sinks.sk1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S agent1.sinks.sk1.serializer = org.apache.flume.sink.hdfs.AvroEventSerializer$Builder agent1.sinks.sk1.serializer.compressionCodec = snappy agent1.sinks.sk1.serializer.schemaURL = hdfs://namenode/path/to/schema.avsc
Summary
In short, we can say that the event serializer is a mechanism that converts flume events into other formats for the output. There are different types of EventSerializers that ship with Apache Flume.
The different Event Serializers are Body Text serializer, “Flume Event” Avro Event Serializer, and Avro Event Serializer. The article had explained all these in detail along with their configuration properties and examples.
Did you like our efforts? If Yes, please give DataFlair 5 Stars on Google