Apache Flink Shell Commands Tutorial – A Quickstart For beginners 1


1. Objective

This Apache Flink quickstart tutorial will take you through various apache Flink shell commands. Apache flink provides an interactive shell / Scala prompt where the user can run flink commands for different transformation operations to process data. This is an Apache Flink beginners guide with step by step list of flink commands /operations to interact with flink shell.

To learn more about Apache Flink follow this comprehensive Guide.

apache flink shell commands tutorial guide

2. Apache Flink shell Commands

Start the Flink Shell:

Before starting Apache Flink shell you need to install Flink, to install Flink follow this installation tutorial.

Apache Flink is shipped with interactive command shell or Flink-scala shell prompt, to start Flink shell:

 $bin/start-scala-shell.sh local 

2.1. Create DataSet

It will create a DataSet with name “data”. On DataSet you can perform various transformation operations to process data:

 val data = benv.readTextFile("../about-dataflair.txt") 

You can download the data file from here.

2.2. Print elements of dataset

It will print all the elements of DataSet:

 data.print 

2.3. Count the number of elements

To count the number of elements of DataSet:

 data.count 

2.4. Filter Operation

Perform the filter operation based on custom business logic, based on the condition in the filter() data will be filtered and the result will be stored in dfData:

 val dfData = data.filter(_.contains("DataFlair"))
dfData.print
dfData.count 

2.5. Map Operation

Process the DataSet using Map operation, Map transformation takes one input at a time and returns 1 element. We can write custom business logic in the Map operation. Here we are converting data to upper case:

 val uData = data.map(_.toUpperCase)
uData.print 

3. Log Mining Use case Example in Flink

In this use case, we will analyze the log data of Hadoop and generate various insights. You can download the data from here.

i. Create DataSet

Create DataSet of an input file. DataSet is the basic abstraction in Flink.

 val logs = benv.readTextFile("../hadoop_nm_logs") 

ii. Print Log Records

 logs.print 

iii. Count the Number of Lines

 logs.count 

iv. Error Analysis

Analyze the logs and find out how many errors/exceptions are there. We can analyze the logs and find out the exceptions to take corrective measures-

a. Filter the Errors

 val err_logs = logs.filter(_.toLowerCase.contains("exception"))
err_logs.count
err_logs.print 

b. Filter the Warnings

 val warn_logs = logs.filter(_.toLowerCase.contains("warn"))
warn_logs.count
warn_logs.print 

To study more Apache Flink real world use case follow this flink use-cases guide.

4. Wordcount Example in Flink

In this section of Apache Flink shell commands tutorial, Classic Wordcount example is explained to run in Flink’s Scala shell/prompt. Count the number of times a specific word has occurred in the dataset-

i. Create DataSet

 val data = benv.readTextFile("../about-dataflair.txt ") 

ii. FlatMap Operation – Convert line into words

 val words = data.flatMap { _.toLowerCase.split(" ") } 

iii. Map Operation – Convert word into (word,1) pair

 val wordPair = words.map { (_, 1) } 

iv. GroupBy – Group similar words together and sum the counts

 val result = wordPair.groupBy(0).sum(1) 

v. Print the result

 result.print 

Create your first Apache Flink Application in java with eclipse using this Flink guide.


Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “Apache Flink Shell Commands Tutorial – A Quickstart For beginners