Apache Flink – Run Wordcount program in Eclipse 1


1. Objective

In our previous guides, we discussed how to install Apache Flink on ubuntu. In this tutorial, we will understand how to develop and run Apache Flink wordcount program in Java in eclipse.

We can also use Scala language to write wordcount program in Apache Flink. To learn Scala get the best Scala Books from here.

Apache Flink wordcount program tutorial

2. Platform

  1. Operating system: You can run the code in Windows / Mac / Linux
  2. Java 7.x or higher
  3. Eclipse – Latest version

3. Steps to make project

  1. Make a new java project
  2. Add the following JAR in the build path. You can find the jar files in the lib directory in Flink home:
    flink-dist_2.11-1.0.3
    flink-python_2.11-1.0.3
    log4j-1.2.17
    slf4j-log4j12-1.7.7

4. Apache Flink Wordcount program


import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;


public class FlinkProgram {
    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

      
        DataSet<String> rawdata = env.readTextFile("E:\\readme.txt"); //change path with your filepath of text file

        DataSet <Tuple2<String, Integer>> result = rawdata
            .flatMap(new Splitter())
            .groupBy(0)
            .sum(1);
//To print result we can call print method
        result.print();
    }

    public static class Splitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
        @Override
        public void flatMap(String line, Collector<Tuple2<String, Integer>> out) {
            for (String wordToken : line.split(" ")) {
                out.collect(new Tuple2<String, Integer>(wordToken, 1));
            }
        }
    }
}

The ExecutionEnvironment provides methods to control the job execution and to access the data from other Environment.
DataSet represents the collection of elements of a specific type. The type can be String, Integer, Long and tuple like:

<Tuple2<String, Integer>>

In this Apache Flink wordcount program, we are using FlatMap APIs. In the flatMap function, we can write our custom business logic. It takes one element as an input and produces zero, one or more elements.

We have seen the practical implementation of Wordcount program in Apache Flink using eclipse IDE. You can run this program directly in eclipse using run option. You can also refer this link to understand What is Apache Flink?


Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “Apache Flink – Run Wordcount program in Eclipse