Install and Configure Apache Flink on Ubuntu

1. Objective

In this Flink tutorial, we will learn the Apache Flink installation on Ubuntu. Apache Flink is stream data flow engine which processes data at lightening fast speed, to understand what is Flink follow this Flink introduction guide. In this Flink deployment tutorial, we will see how to install Apache Flink in standalone mode and how to run sample programs.

Install and Run Apache Flink on Windows

Install and Run Apache Flink on Windows

2. Apache Flink Installation on Ubuntu

i. Platform

a. Platform Requirements

Operating system: Ubuntu 14.04 or later, we can also use other Linux flavors like CentOS, Redhat, etc.
In this Apache Flink Installation on ubuntu tutorial, we will install Apache Flink 1.x

b. Configure & Setup Platform

If you are using Windows/Mac OS you can create virtual machine and install Ubuntu using VMWare Player, alternatively, you can create virtual machine and install Ubuntu using Oracle Virtual Box

ii. Install Java

Apache Flink requires Java to be installed as it runs on JVM. So, let’s begin by installing Java.

a. Install Python Software Properties

$ sudo apt-get install python-software-properties

b. Add Repository

$ sudo add-apt-repository ppa:webupd8team/java

c. Update the source list

$ sudo apt-get update

d. Install Java

$ sudo apt-get install oracle-java7-installer
On executing above command Java will be automatically downloaded and installed.

e. Verify Java Installation

To check whether installation procedure gets successfully completed or not and to know the version of Java installed we can use the below command:
$ java -version

iii. Install Apache Flink

a. Download the Apache Flink

You can download Flink from official Apache website, use this link to download Apache Flink Click here.

b. Untar the setup file

Move the downloaded setup file in home directory and run below command to extract Flink:
dataflair@ubuntu:~$ tar xzf flink-1.1.3-bin-hadoop26-scala_2.11.tgz

c. Rename the installation Directory

dataflair@ubuntu:~$ mv flink-1.1.3/ flink

d. Change the working directory to Flink Home

To start Flink services, run sample program and play with it, change the directory to flink by using below command
dataflair@ubuntu:~$ cd flink

e. Start Flink

Start Apache Flink in a local mode use this command
dataflair@ubuntu:~/flink$ bin/start-local.sh

f. Check status

Check the status of running services
dataflair@ubuntu:~/flink$ jps

Output should be
6740 Jps
6725 JobManager

g. Apache Flink Web UI

To start Web UI use the following URL

localhost:8081

iv. Run Wordcount example on Flink

To run Wordcount example on flink use the following command
Before that make an input file in a home directory with some data as a sample and save it as input.txt
dataflair@ubuntu:~/flink$ bin/flink run examples/batch/WordCount.jar -input /home/dataflair/input.txt -output /home/dataflair/output.txt
Now to install Flink on the real multi-node cluster in distributed standalone mode follow this tutorial.

3. Conclusion

4 Responses

  1. Koen says:

    It seems like the “non”-Hadoop version works also just on OSx! it just needs java! (and python?)

  2. dc says:

    Do I have to have the hadoop vm installed already? I used java 8 , and the new version of flink would that work?

  3. DF HD Team says:

    Yes, You can run Apache Flink without Hadoop installation. Though Apache Flink also provides native support to interact with Hadoop, and Flink can use Hadoop input format as well as output format. Apache Flink needs Java / Scala to run applications. Java 7 as well as Java 8 is supported with Flink, you can use any of the Java version.

  4. Eamale says:

    Greatly explained how to install apache flink on ubuntu. Thanks for updating me with latest technology!!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.