Flume Data Flow – Types & Failure Handling in Apache Flume

1. Objective – Flume Data Flow

As we know we use Flume to move log data into HDFS. However, there is a pattern in which data flows in Apache Flume. Though there are various types of Data Flow in Flume. Such as Multi-hop Flow, Fan-out Flow, and Fan-in Flow with Flume Data Flow model. So, in this article, we will learn the Flume Data Flow as well as its types. Also, we will cover Failure Handling to understand this topic well.

Introduction - Flume Data Flow

Introduction – Flume Data Flow

2. Introduction to Flume Data Flow

Basically, we use a framework Flume to transfer log data into HDFS. However, we can say events and log data are generated by the log servers. Also, these servers have Flume agents running on them. Moreover, these agents receive the data from the data generators.
To be more specific, in Flume there is an intermediate node which collects the data in these agents, that nodes are what we call as Collector. As same as agents, in Flume, there can be multiple collectors.
Let’s revise Apache Flume Architecture & Flume Features
Afterwards, from all these collectors the data will be aggregated and pushed to a centralized store. Such as HBase or HDFS.  To understand better, refer the following Flume Data Flow diagram, it explains Flume Data Flow model.

Flume Data flow

Flume Data flow Model

Get the most demanding skills of IT Industry - Learn Hadoop

3. Types of Data Flow in Flume

a. Multi-hop Flow

Basically, before reaching the final destination there can be multiple agents and an event may travel through more than one agent, within Flume. This is what we call as multi-hop Data flow in Flume.
Let’s revise Flume Troubleshooting – Flume Known Issues in detail

b. Fan-out Flow

In very simple language when data transfers or the data flow from one source to multiple channels that is what we call fan-out flow. Basically, in Flume Data flow, it is of two categories −

i. Replicating

It is the data flow where the data will be replicated in all the configured channels.

ii. Multiplexing

On defining Multiplexing we can say the data flow where the data will be sent to a selected channel which is mentioned in the header of the event.
Read about Apache Flume Source & Flume Event Serializers

c. Fan-in Flow

While it comes to fan-in flow it is known as the data flow in which the data will be transferred from many sources to one channel.

Hadoop Quiz

4. Flume Failure Handling 

However, for each event, there are two transactions which take place. They are one at the sender and one at the receiver. Basically, the sender sends events to the receiver. Although, the receiver commits its own transaction and sends a “received” signal to the sender, soon after receiving the data. Thus, the sender commits its transaction just after receiving the signal.
Note: Make sure that the sender will not commit its transaction till it receives a signal from the receiver.

5. Conclusion

So, in this Apache Flume Data Flow article, we have learned all the types of Flume Data Flow possible in Apache Flume, Multi-hop Flow, Fan-out Flow, Fan-in Flow, Failure Handling with Flume Data Flow model. Hope this content helps you, If you have any query, feel free to ask in the comment section.
See Also-
Apache Flume Installation
Best Apache Flume Books to Learn Flume
For reference

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.