Run Apache Flink Wordcount Program in Eclipse

Free Flink course with real-time projects Start Now!!

In our previous guides, we discussed how to install Apache Flink on ubuntu. In this tutorial, we will understand how to develop and run Apache Flink wordcount program in Java in eclipse.

We can also use Scala language to write wordcount program in Apache Flink. To learn Scala get the best Scala Books from here.

Platform

  1. Operating system: You can run the code in Windows / Mac / Linux
  2. Java 7.x or higher
  3. Eclipse – Latest version

Steps to make project

  1. Make a new java project
  2. Add the following JAR in the build path. You can find the jar files in the lib directory in Flink home:
    flink-dist_2.11-1.0.3
    flink-python_2.11-1.0.3
    log4j-1.2.17
    slf4j-log4j12-1.7.7

Apache Flink Wordcount program

[php]
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;
public class FlinkProgram {
public static void main(String[] args) throws Exception {
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<String> rawdata = env.readTextFile(“E:\\readme.txt”); //change path with your filepath of text file
DataSet <Tuple2<String, Integer>> result = rawdata
.flatMap(new Splitter())
.groupBy(0)
.sum(1);
//To print result we can call print method
result.print();
}
public static class Splitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String line, Collector<Tuple2<String, Integer>> out) {
for (String wordToken : line.split(” “)) {
out.collect(new Tuple2<String, Integer>(wordToken, 1));
}
}
}
}
[/php]

The execution environment provides methods to control the job execution and to access the data from other Environment.

DataSet represents the collection of elements of a specific type. The type can be String, Integer, Long and tuple like:

<Tuple2<String, Integer>>

In this Apache Flink wordcount program, we are using FlatMap APIs. In the flatMap function, we can write our custom business logic. It takes one element as an input and produces zero, one or more elements.

We have seen the practical implementation of Wordcount program in Apache Flink using eclipse IDE. You can run this program directly in eclipse using run option. You can also refer this link to understand What is Apache Flink?

Conclusion

Hence, in this Apache Flink tutorial, we have discussed the Apache Flink Wordcount program. Moreover, we saw the steps and platform to make the project. Still, if you have any problem in running the Apache Flink Wordcount Program, ask in the comment tab.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google

follow dataflair on YouTube

1 Response

  1. prakash kvs says:

    Please send me the details to my mail id

Leave a Reply

Your email address will not be published. Required fields are marked *