Apache Flink Shell Commands Tutorial – A Quickstart For beginners

Free Flink course with real-time projects Start Now!!

This Apache Flink quickstart tutorial will take you through various apache Flink shell commands. Apache Flink provides an interactive shell / Scala prompt where the user can run Flink commands for different transformation operations to process data.

This is an Apache Flink beginners guide with step by step list of Flink commands /operations to interact with Flink shell.

To learn more about Apache Flink follow this comprehensive Guide.

Apache Flink shell Commands

Start the Flink Shell:
Before starting Apache Flink shell you need to install Flink, to install Flink follow this installation tutorial.
Apache Flink is shipped with interactive command shell or Flink-scala shell prompt, to start Flink shell:
[php] $bin/start-scala-shell.sh local [/php]

i. Create DataSet

It will create a DataSet with name “data”. On DataSet you can perform various transformation operations to process data:
[php] val data = benv.readTextFile(“../about-dataflair.txt”) [/php]
You can download the data file from here.

ii. Print elements of dataset

It will print all the elements of DataSet:
[php] data.print [/php]

iii. Count the number of elements

To count the number of elements of DataSet:
[php] data.count [/php]

iv. Filter Operation

Perform the filter operation based on custom business logic, based on the condition in the filter() data will be filtered and the result will be stored in dfData:
[php] val dfData = data.filter(_.contains(“DataFlair”))
dfData.print
dfData.count [/php]

v. Map Operation

Process the DataSet using Map operation, Map transformation takes one input at a time and returns 1 element. We can write custom business logic in the Map operation. Here we are converting data to upper case:
[php] val uData = data.map(_.toUpperCase)
uData.print [/php]

Log Mining Use case Example in Flink

In this use case, we will analyze the log data of Hadoop and generate various insights. You can download the data from here.
i. Create DataSet
Create DataSet of an input file. DataSet is the basic abstraction in Flink.
[php] val logs = benv.readTextFile(“../hadoop_nm_logs”) [/php]
ii. Print Log Records
[php] logs.print [/php]
iii. Count the Number of Lines
[php] logs.count [/php]
iv. Error Analysis
Analyze the logs and find out how many errors/exceptions are there. We can analyze the logs and find out the exceptions to take corrective measures-
a. Filter the Errors
[php] val err_logs = logs.filter(_.toLowerCase.contains(“exception”))
err_logs.count
err_logs.print [/php]
b. Filter the Warnings
[php] val warn_logs = logs.filter(_.toLowerCase.contains(“warn”))
warn_logs.count
warn_logs.print [/php]
To study more Apache Flink real world use case follow this flink use-cases guide.

Wordcount Example in Flink

In this section of Apache Flink shell commands tutorial, Classic Wordcount example is explained to run in Flink’s Scala shell/prompt. Count the number of times a specific word has occurred in the dataset-
i. Create DataSet
[php] val data = benv.readTextFile(“../about-dataflair.txt “) [/php]
ii. FlatMap Operation – Convert line into words
[php] val words = data.flatMap { _.toLowerCase.split(” “) } [/php]
iii. Map Operation – Convert word into (word,1) pair
[php] val wordPair = words.map { (_, 1) } [/php]
iv. GroupBy – Group similar words together and sum the counts
[php] val result = wordPair.groupBy(0).sum(1) [/php]
v. Print the result
[php] result.print [/php]
Create your first Apache Flink Application in java with eclipse using this Flink guide.

Conclusion – Flink Shell Commands

Hence, in this Apache Flink Shell Commands tutorial, we discussed this with the help of example such ad log mining and Flink wordcount. Still, if you have any query regarding Apache Flink Shell Commands, ask in the comment tab.

Did you like this article? If Yes, please give DataFlair 5 Stars on Google

follow dataflair on YouTube

1 Response

  1. PALANIVEL RAJALINGAM says:

    Good Job….Dear

Leave a Reply

Your email address will not be published. Required fields are marked *