Apache Flink Cluster Setup on CentOS | Installation Process

1. Objective

Through this Apache Flink installation Tutorial, we will understand how to setup multi node Apache Flink cluster. Moreover, we will see Apache Flink Cluster configuration, prerequisites for Flink cluster setup, installation of Flink. Also, we will look at Flink cluster execution, starting Flink cluster and how to stop the cluster in Flink. Along with this, we will also understand how to start programming in Apache Flink and run Flink Applications after Apache Flink Cluster setup on CentOS/RedHat.

So, let’s start Apache Flink Cluster Setup Tutorial.

Apache Flink Cluster Setup on CentOS

Apache Flink Cluster Setup on CentOS | Installation Process

2. Introduction to Apache Flink Cluster setup on CentOS

Before we start setting cluster on Flink, let us revise our Flink concepts.

So, as we know Apache Flink – Key Big data platform and we have seen what is Apache Flink, Apache Flink features and Apache Flink use cases in real time, let us learn how to install Apache Flink on CentOS. Moreover, what are the prerequisites for Apache Flink Cluster and also various commands and setups required for complete Flink installation?

a. Platform for Apache Flink Installation on CentOS

  • OS: Linux is supported as a development and production platform. Here we will use CentOS or Redhat for Flink installation.
  • Flink: Apache Flink 1.x (flink-1.1.3-bin-hadoop26-scala_2.10.tgz)

3. Install Flink on Master

i. Prerequisites for Apache Flink Cluster

a.  Add Entries in hosts file

You need to edit hosts file ($sudo nano /etc/hosts) and add entries of master and slaves as below:

MASTER-IP master
SLAVE01-IP slave01
SLAVE02-IP slave02 

(NOTE: In place of MASTER-IP, SLAVE01-IP, SLAVE02-IP put the value of corresponding IP)

b. Install Java 8 (Recommended Oracle Java)

You need to perform below steps for Java 8 installation on CentOS:
Download Archive File
Download latest version of java for 32 Bit:

$ cd /opt/
$ wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u25-b17/jdk-8u25-linux-i586.tar.gz"
$ tar jdk-8u25-linux-i586.tar.gz 

Install JAVA
After extracting tar file, we just need to set up new version of java using alternatives. Use the following commands to do it.

$ cd /opt/jdk1.8.0_25/
$ alternatives --install /usr/bin/java java /opt/jdk1.8.0_25/bin/java 2
$ alternatives --config java

There are 3 programs which provide ‘java’.
/opt/jdk1.8.0/bin/java
/opt/jdk1.7.0_55/bin/java
/opt/jdk1.8.0_25/bin/java 

Once JAVA 8 installation on server is done, we need to setup javac and jarring using following commands:
$ alternatives --install /usr/bin/jar jar /opt/jdk1.8.0_25/bin/jar 2
$ alternatives --install /usr/bin/javac javac /opt/jdk1.8.0_25/bin/javac 2
$ alternatives --install /usr/bin/javaws javaws /opt/jdk1.8.0_25/bin/javaws 2
$ alternatives --set jar /opt/jdk1.8.0_25/bin/jar
$ alternatives --set javac /opt/jdk1.8.0_25/bin/javac 

Check JAVA Version
Following command has to be used to check java version:

$ java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode) 

Setup Environment Variables
Follow below steps to set Java environment:
Setup JAVA_HOME Variable
$ export JAVA_HOME=/opt/jdk1.8.0_25
Setup JRE_HOME Variable
$ export JRE_HOME=/opt/jdk1.8.0_25/jre
Setup PATH Variable
$ export PATH=$PATH:/opt/jdk1.8.0_25/bin:/opt/jdk1.8.0_25/jre/bin

c. Configure SSH

Below are the steps for SSH configuration:

Install Open SSH Server-Client:
$sudo yum -y install openssh-server openssh-client

Start the SSH Services

$sudo chkconfig sshd on
$sudo service sshd start

Generate Key Pairs:
$ssh-keygen -t rsa -P ""

Configure password-less SSH:
Copy the content of .ssh/id_rsa.pub (of master) to .ssh/authorized_keys (of all the slaves as well as master)

Check by SSH to all the Slaves:

$ssh slave01
ssh slave02

ii. Install Flink on RedHat/CentOS

Now we are all ready with the prerequisites to install Flink. Let us start Flink installation on RedHat/CentOS.

a. Download Flink

You need to download below Flink setup for installation:
http://www-eu.apache.org/dist/flink/flink-1.1.3/flink-1.1.3-bin-hadoop26-scala_2.10.tgz

b. Untar Tar ball

$tar xzf flink-1.1.3-bin-hadoop26-scala_2.10.tgz
(Note: All the required jars, scripts, configuration files, etc. are available in FLINK_HOME directory (flink-1.1.3))

c. Setup Configuration

Now set the required Flink configuration as below:

Edit .bashrc

Edit .bashrc file located in user’s home directory and add following environment variables:

export FLINK_HOME=/home/ubuntu/flink-1.1.3/
export PATH=$PATH:$FLINK_HOME/bin 

(Note: After above step, restart the Terminal/Putty so that all the environment variables will come into effect)

Edit flink-conf.yaml:
Edit configuration file flink-conf.yaml (located in FLINK_HOME/conf) and specify master node (Job Manager):

$nano flink-conf.yaml
jobmanager.rpc.address: master 

Edit Slaves:
Edit configuration file slaves (located in FLINK_HOME/conf) and add following entries:

$nano slaves
slave01
slave02

“Flink is setup on Master; now install Flink on all the Slaves”

iii. Install Flink On Slaves

Below are the steps required to be performed for installing Apache Flink on Slave nodes:

a. Setup Pre-requisites on all the slaves

Run following steps on all the slaves:

  • “1.1. Add Entries in hosts file”
  • “1.2. Install Java 8 (Recommended Oracle Java)”

b. Copy configured setups from master to all the slaves

Create tar-ball of configured setup:
$ tar czf flink.tar.gz flink-1.1.3
(NOTE: Run this command on Master)

Copy the configured tar-ball on all the slaves
$ scp flink.tar.gz slave01:~
(NOTE: Run this command on Master)
$ scp flink.tar.gz slave02:~
(NOTE: Run this command on Master)

c. Un-tar configured flink setup on all the slaves

$tar xzf flink.tar.gz
(NOTE: Run this command on all the slaves)
Flink is setup on all the Slaves. Now let us start the Cluster

iv. Start the Apache Flink Cluster

Once Flink setup on Master and slave is completed, we need to start the Flink services as below:

a. Start the Services

$bin/start-cluster.sh
(Note: Run this command on Master)

b. Check whether services have been started

Use the commands as shown below to check the status of the services:

Check daemons on Master

$jps
JobManager

Check daemons on Slaves

$jps
TaskManager

v. Play with Apache Flink

As the Flink setup on master and slave is completed and all services are running fine, let us start Flink applications:

a. Flink Web UI

http://<Master-IP>:8081
The UI will show the information about job manager, task managers, jobs, etc.

b. Run Flink Application

$ bin/flink run <Jar-Path> -input <Input-Path> -output <Output-Path>
Note: If you are using local-FS for input, the input file must be available on all the nodes of the cluster. To Use HDFS use hdfs://master:9000///<Path>

vi. Stop the Flink Cluster

Once you are done with Flink practicals, let us learn how to stop the Flink cluster.
Use below commands for the same:

a. Stop the Apache Flink Services

$bin/stop-cluster.sh
(Note: Run this command on Master)
Now when we have learnt how to do Flink installation on multi node cluster in CentOS/RedHat, let us learn some of the Flink real life use cases and Commands to play with Apache Flink.
Learn how to install Flink Cluster on Linux.

So, this was all in Apache Flink Cluster Setup Tutorial. Hope you like our explanation.

4. Conclusion – Apache Flink Cluster

Hence, in this Apache Flink Cluster setup, we discussed Flink installation on CentOs. Also, we saw installing Flink on Master and Slaves. Still, if you have any confusion, ask in the comment tab.

Reference for Flink

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.