Create and Run First Apache Flink Application in Java in Eclipse 6


1. Objective

This step by step tutorial will explain how to create an Apache Flink project in Eclipse and how to submit the application after the creation of jar. Learn how to configure development environment for developing Apache Flink Application. If you want to learn more about Apache Flink you can refer this Apache Flink introduction guide.

Create and Run First Apache Flink Application in Java in Eclipse Training Tutorial DataFlair

2. Create An Apache Flink Application in Java

This section will guide you step by step to create an Apache Flink application in java in eclipse-

2.1. Platform

  • Operating system: Ubuntu (or any flavor of Linux)
  • Java 7.x or higher
  • Eclipse – Latest version

2.2. Steps to create project

i. Create a new java project

Apache Flink Application - make new project

ii. You can name the project as WordCount or with a name of your choice.

create an Apache Flink Application - make new project step 2

iii. Add the following JAR in the build path. You can find the jar files in the lib directory in Flink home:

  • flink-dist_2.11-1.0.3.jar
  • flink-python_2.11-1.0.3.jar
  • log4j-1.2.17.jar
  • slf4j-log4j12-1.7.7.jar

iv. Right Click on the project and select Configure build path option from build path

Apache Flink Project - configure buid path

v. Add the jar Files from lib folder present in Apache flink home

Create Flink Application in Java - add jar

2.3. Make a class WordCount

Apache Flink Application - make new class

2.4. Copy below Apache Flink Wordcount Code in Editor

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.aggregation.Aggregations;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.utils.ParameterTool;

import org.apache.flink.util.Collector;


public class WordCount {
	  
	  public static void main(String[] args) throws Exception {
	    
	    // set up the execution environment

 final ParameterTool params = ParameterTool.fromArgs(args);
	    final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
	    env.getConfig().setGlobalJobParameters(params);
	    
	   
	    // get input data
	  
	    DataSet<String>	text = env.readTextFile(params.get("input"));
	    
	    DataSet<Tuple2<String, Integer>> counts = 
	        // split up the lines in pairs (2-tuples) containing: (word,1)
	        text.flatMap(new Splitter())
	        // group by the tuple field "0" and sum up tuple field "1"
	        .groupBy(0)
	        .aggregate(Aggregations.SUM, 1);

	    // emit result
	    
	    counts.writeAsText(params.get("output"));
	    // execute program
	    env.execute("WordCount Example");
	  }
	}
	//The operations are defined by specialized classes, here the Splitter class.

	 class Splitter implements FlatMapFunction<String, Tuple2<String, Integer>> {

	  @Override
	  public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
	    // normalize and split the line into words
	    String[] tokens = value.split("\\W+");
	    
	    // emit the pairs
	    for (String token : tokens) {
	      if (token.length() > 0) {
	        out.collect(new Tuple2<String, Integer>(token, 1));
	      }
	    }
	  }
	}

2.5. Export the Apache Flink Jar File

Before running created Apache Flink word count application we have to create a jar file. Right click on project >> export

Create an Apache Flink Application - export jar

Select Jar-file Option to Export

Create a project in Flink - choose jar option

Create the Jar file

Apache Flink Application - export jar v2

2.6. Go to Apache Flink Home Directory

i. start the Apache Flink services by using following commands

cd Flink
bin/start-local.sh

Apache Flink Application - flink home

2.7. Sample Data

Save the following data as input.txt, according to our command it is saved in a home folder.

DataFlair services pvt ltd provides training in Big Data Hadoop, Apache Spark, Apache Flink, Apache Kafka, Hbase, Apache Hadoop Admin
10000 students are taking training from DataFlair services pvt ltd 
The chances of getting good job in big data hadoop is high
If you want to become an expert hadoop developer then enroll now!! hurry up!!

2.8. Submit Apache Flink Application

Use the following command to Submit the Apache Flink Application

bin/flink run --class <class-name> <Jar path>   --input <Input File Path>  --output <Output Flie Path>
bin/flink run --class WordCount /home/dataflair/WordCount.jar   --input file:///home/dataflair/input.txt  --output  file:///home/dataflair/Desktop/output.txt<br/>

We have successfully exported the Apache Flink program.To learn more about the installation of Apache Flink you can refer this Apache Flink installation guide.

Let’s understand above command:

--class is used to specify the main class which is an entry point of your program

/home/dataflair/WordCount.jar   - It is the jar file in which your program exist

--input is used to specify the input path of your file

--output is used to specify the output path of your file

Apache Flink run command

When you will run the above command you will get the following result on your terminal-

Apache Flink run command output

2.9. Output

Here is the output of our Apache Flink Word Count program.

Apache Flink Application Output

We can also create Apache Flink project in Scala as well. To Learn Scala follow this Scala tutorial.


Leave a comment

Your email address will not be published. Required fields are marked *

6 thoughts on “Create and Run First Apache Flink Application in Java in Eclipse