Apache Flume Installation Tutorial – A beginners guide

1. Objective

This Flume tutorial contains easy steps for Apache Flume installation and configuration. This Flume quick start will help you setup Apache Flume environment and run Flume to transport data into HDFS using Flume NG agent. Apache Flume is a tool used for collecting, aggregating and transporting large amounts of streaming data like log files, events, etc., from a number of different sources to a centralized data store like HDFS, to learn more about Flume follow this introductory guide.

Apache Flume Installation Tutorial - A beginners guide

Apache Flume Installation Tutorial – A beginners guide

Hadoop Quiz

2. Apache Flume Installation

Flume Installation_Bashrc file

Apache Flume Installation_Bashrc file

Flume Installation_Bashrc file Update

Apache Flume Installation_Bashrc file Update

Flume Installation_Verify Installation

Apache Flume Installation_Verify Installation

follow the steps given below to Install and Configuring Flume:
Step 1:
Download Flume from the below link:
Step 2:
Untar the downloaded setup, after moving the setup to the desired location, extracted Apache flume file:
dataflair@ubuntu:~$ tar xzf apache-flume-1.6.0-bin.tar.gz
Step 3:
Edit the “.bashrc” file, to edit .bashrc file we will use below command:
dataflair@ubuntu:~$ nano .bashrc command.
Add the path of your Apache Flume directory FLUME_HOME path on it:

export FLUME_HOME=/home/dataflair/apache-flume-1.6.0-bin/
export PATH=$PATH:$FLUME_HOME/bin/

Note: “/home/dataflair/apache-flume-1.6.0-bin” is the path of my Apache Flume path. Please enter the correct path of your Apache Flume file.
After adding the above parameters save this file by pressing “Ctrl+X” and then “Y” and now you need to refresh the .bashrc file so that environment variables will start working. To refresh .bashrc file execute this command:
dataflair@ubuntu:~$ source .bashrc

Step 4:
In order to verify that Flume has been successfully configured execute the below command on the terminal and if the below output gets appeared means, you had successfully installed and configured Flume.
dataflair@ubuntu:~$ flume-ng --help

Congratulations you had successfully installed and configured Flume.

Get the most demanding skills of IT Industry - Learn Hadoop

3. Configuring Flume to copy data into HDFS

Configuring Flume_Browse Directory

Apache Flume Installation -Configuring Flume_Browse Directory

Step 1: Create an access.log file in your home directory and add data in it and save it.
Step 2: Create a file flume.conf file inside /home/dataflair/apache-flume-1.6.0-bin/conf
And add the following parameters in it:

FileAgent.sources = tail
FileAgent.channels = Channel-2
FileAgent.sinks = HDFS
FileAgent.sources.tail.type = exec
FileAgent.sources.tail.command = tail -F /home/dataflair/access.log
FileAgent.sources.tail.channels = Channel-2
FileAgent.sinks.HDFS.type = hdfs
FileAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/flume
FileAgent.sinks.HDFS.hdfs.fileType = DataStream
FileAgent.sinks.HDFS.channel = Channel-2
FileAgent.channels.Channel-2.type = memory

Note: Here “/home/dataflair/access.log” is the path of the data file. You need to provide the path where you had created “access.log” file.
Step 3: Start Flume to copy data to HDFS:
bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -n FileAgent
Note: The agent name is specified by -n FileAgent and must match an agent name given in -f conf/flume.conf
In order to check whether data got copied in HDFS, you can either use web console (http://localhost:50070) or from the command prompt view files present in HDFS.
On web console you can view your files in “/flume” directory like this:

In this way, you can copy your data into HDFS using Flume.
Learn more about HDFS commands and Haw data read and write operation are performed in HDFS.

2 Responses

  1. DTR Prasad says:

    when i ran this program , i am not able tosee the copied file as it says
    “path does not exist on HDFS or WebHDFS is disabled enable WebHDFS”
    where to seee

  2. Luz Aguilar says:

    Thanks for the explanation.
    I am new in this topic. I have problem access the flume directory on the web console. “Path does not exist on HDFS” How I can create a new folder into HFDS to put the data there. I think that was done with the configuration file?

Leave a Reply

Your email address will not be published. Required fields are marked *