Apache Flume Installation Tutorial – A beginners guide
This Flume tutorial contains easy steps for Apache Flume installation and configuration. This Flume quick start will help you setup Apache Flume environment and run Flume to transport data into HDFS using Flume NG agent. Apache Flume is a tool used for collecting, aggregating and transporting large amounts of streaming data like log files, events, etc., from a number of different sources to a centralized data store like HDFS, to learn more about Flume follow this introductory guide.
2. Apache Flume Installation
follow the steps given below to Install and Configuring Flume:
Download Flume from the below link:
Untar the downloaded setup, after moving the setup to the desired location, extracted Apache flume file:
dataflair@ubuntu:~$ tar xzf apache-flume-1.6.0-bin.tar.gz
Edit the “.bashrc” file, to edit .bashrc file we will use below command:
dataflair@ubuntu:~$ nano .bashrc command.
Add the path of your Apache Flume directory FLUME_HOME path on it:
export FLUME_HOME=/home/dataflair/apache-flume-1.6.0-bin/ export PATH=$PATH:$FLUME_HOME/bin/
Note: “/home/dataflair/apache-flume-1.6.0-bin” is the path of my Apache Flume path. Please enter the correct path of your Apache Flume file.
After adding the above parameters save this file by pressing “Ctrl+X” and then “Y” and now you need to refresh the .bashrc file so that environment variables will start working. To refresh .bashrc file execute this command:
dataflair@ubuntu:~$ source .bashrc
In order to verify that Flume has been successfully configured execute the below command on the terminal and if the below output gets appeared means, you had successfully installed and configured Flume.
dataflair@ubuntu:~$ flume-ng --help
Congratulations you had successfully installed and configured Flume.
3. Configuring Flume to copy data into HDFS
Step 1: Create an access.log file in your home directory and add data in it and save it.
Step 2: Create a file flume.conf file inside /home/dataflair/apache-flume-1.6.0-bin/conf
And add the following parameters in it:
FileAgent.sources = tail FileAgent.channels = Channel-2 FileAgent.sinks = HDFS FileAgent.sources.tail.type = exec FileAgent.sources.tail.command = tail -F /home/dataflair/access.log FileAgent.sources.tail.channels = Channel-2 FileAgent.sinks.HDFS.type = hdfs FileAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/flume FileAgent.sinks.HDFS.hdfs.fileType = DataStream FileAgent.sinks.HDFS.channel = Channel-2 FileAgent.channels.Channel-2.type = memory
Note: Here “/home/dataflair/access.log” is the path of the data file. You need to provide the path where you had created “access.log” file.
Step 3: Start Flume to copy data to HDFS:
bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -n FileAgent
Note: The agent name is specified by -n FileAgent and must match an agent name given in -f conf/flume.conf
In order to check whether data got copied in HDFS, you can either use web console (http://localhost:50070) or from the command prompt view files present in HDFS.
On web console you can view your files in “/flume” directory like this:
In this way, you can copy your data into HDFS using Flume.
Learn more about HDFS commands and Haw data read and write operation are performed in HDFS.