Apache Flume Sink Processors- Its Types

1. Objective

To invoke a particular sink from the selected group of sinks, we generally use sink processors. Also, we use it to create failover paths for our sinks or load balance events across multiple sinks from a channel. So, in this blog, we will learn all the types of Flume Sink Processors in Apache Flume. Moreover, we will see all examples to understand these Flume Sink Processors well.

What is Apache Flume Sink Processors

What is Apache Flume Sink Processors

2. Introduction to Apache Flume Sink Processors

Basically, to group multiple sinks into one entity Flume Sink groups allow users. However, we use these Flume Sink Processors to offer load balancing capabilities overall sinks inside the group. Also,  in case of temporal failure to achieve failover from one sink to another. Below table shows Flume Sink Processors with their property name & description.

Property NameDefaultDescription
sinksSpace-separated list of sinks that are participating in the group
processor.typedefaultThe component type name needs to be the default, failover or load_balance

Also, see an example for agent named a1:
For Example,
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = load_balance
Let’s revise Apache Flume Architecture

3. Types of Apache Flume Sink Processors

i. Default Sink Processor

Although, Flume Sink processors accepts only a single sink. Still, as a user, it is not necessary to create processor (sink group) for single sinks. Despite that user can follow the source – channel – sink pattern.

ii. Failover Sink Processor

However, it does nothing but maintains a prioritized list of sinks, guaranteeing that so long as one is available events will be processed (delivered).
Let’s study Flume Interceptors in detail
In addition, this mechanism works by relegating failed sinks to a pool where they are assigned a cool down period, increasing with sequential failures before they are retired. However, once a sink successfully sends an event it is restored to the live pool.  Since Sinks have a priority associated with them, it means larger the number, higher the priority. Although, while sending an Event the next Sink with the highest priority shall be tried next for sending Events if in any case, Sink fails. To understand this let’s see an example, before the Sink with priority 80, a sink with priority 100 activates. Although, somehow, we can determine the priority on the basis of the order in which the Sinks are specified in the configuration if no priority is specified.
So, now let’s understand to configure it. At first, set a sink groups processor to failover and set priorities for all individual sinks. Make sure that all specified priorities must be unique. Afterwards, by using maxpenalty property we can set an upper limit to failover time(in milliseconds). Below table shows Failover Flume Sink Processors with their property name & description.

Property NameDefaultDescription
sinksSpace-separated list of sinks that are participating in the group
processor.typedefaultThe component type name, needs to be failover
processor.priority.<sinkName>Priority value. <sinkName> must be one of the sink instances associated with the current sink group A higher priority value Sink gets activated earlier. A larger absolute value indicates higher priority
processor.maxpenalty30000The maximum backoff period for the failed Sink (in millis)

Also, see an example for agent named a1:
For Example,
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10
a1.sinkgroups.g1.processor.maxpenalty = 10000
Let’s look at Apache Flume Features

iii. Load balancing Sink Processor

It offers the ability to load-balance flow over multiple sinks. Also, this processor maintains an indexed list of active sinks on which the load must be distributed. Moreover, either via round_robin or random selection mechanisms, implementation supports distributing the load. Likewise, we can say the choice of selection mechanism defaults to round_robin type but we can override it via configuration. Also, custom classes support Custom selection mechanisms, that inherits from AbstractSinkSelector.
Do you Know Apache Flume Use Cases – Future Scope in detail
In addition, this selector picks the next sink using its configured selection mechanism and invokes it, when invoked. Also, the selector propagates the failure to the sink runner, if all sinks invocations result in failure. Below table shows loadbalancing Flume Sink Processors with their property name & description.

Property NameDefaultDescription
processor.sinksSpace-separated list of sinks that are participating in the group
processor.typedefaultThe component type name, needs to be load_balance
processor.backofffalseShould failed sinks be backed off exponentially
processor.selectorround_robinSelection mechanism. Must be either round_robin, random or FQCN of custom class that inherits from AbstractSinkSelector
processor.selector.maxTimeOut30000Used by backoff selectors to limit exponential backoff (in milliseconds)

So, let’s see an example for agent named a1:
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = random

iv. Custom Sink Processor

However, at the moment Custom Flume sink processors do not support.
Read Apache Flume Installation & Books to learn Apache Flume

4. Conclusion

As a result, we have seen Flume Sink Processors in detail with types of sink processors are Default Sink Processor, Failover Sink Processor, Load balancing Sink Processor, and Custom Sink Processor. Also, we have seen Flume Sink Processors examples to understand well. Still, if any doubt occurs, feel free to ask in the comment section.
See Also- Apache Flume Source
For reference

Leave a Reply

Your email address will not be published. Required fields are marked *