Apache Flink Shell Commands Tutorial – A Quickstart For beginners

1. Objective

This Apache Flink quickstart tutorial will take you through various apache Flink shell commands. Apache Flink provides an interactive shell / Scala prompt where the user can run Flink commands for different transformation operations to process data. This is an Apache Flink beginners guide with step by step list of Flink commands /operations to interact with Flink shell.
To learn more about Apache Flink follow this comprehensive Guide.

Apache Flink Shell Commands Tutorial - A Quickstart For beginners

Apache Flink Shell Commands Tutorial – A Quickstart For beginners

2. Apache Flink shell Commands

Start the Flink Shell:
Before starting Apache Flink shell you need to install Flink, to install Flink follow this installation tutorial.
Apache Flink is shipped with interactive command shell or Flink-scala shell prompt, to start Flink shell:
$bin/start-scala-shell.sh local

i. Create DataSet

It will create a DataSet with name “data”. On DataSet you can perform various transformation operations to process data:
val data = benv.readTextFile("../about-dataflair.txt")
You can download the data file from here.

ii. Print elements of dataset

It will print all the elements of DataSet:
data.print

iii. Count the number of elements

To count the number of elements of DataSet:
data.count

iv. Filter Operation

Perform the filter operation based on custom business logic, based on the condition in the filter() data will be filtered and the result will be stored in dfData:

 val dfData = data.filter(_.contains("DataFlair"))
dfData.print
dfData.count 

Join DataFlair on Telegram

v. Map Operation

Process the DataSet using Map operation, Map transformation takes one input at a time and returns 1 element. We can write custom business logic in the Map operation. Here we are converting data to upper case:

 val uData = data.map(_.toUpperCase)
uData.print 

3. Log Mining Use case Example in Flink

In this use case, we will analyze the log data of Hadoop and generate various insights. You can download the data from here.
i. Create DataSet
Create DataSet of an input file. DataSet is the basic abstraction in Flink.
val logs = benv.readTextFile("../hadoop_nm_logs")
ii. Print Log Records
logs.print
iii. Count the Number of Lines
logs.count
iv. Error Analysis
Analyze the logs and find out how many errors/exceptions are there. We can analyze the logs and find out the exceptions to take corrective measures-
a. Filter the Errors

 val err_logs = logs.filter(_.toLowerCase.contains("exception"))
err_logs.count
err_logs.print 

b. Filter the Warnings
 val warn_logs = logs.filter(_.toLowerCase.contains("warn"))
warn_logs.count
warn_logs.print 

To study more Apache Flink real world use case follow this flink use-cases guide.

4. Wordcount Example in Flink

In this section of Apache Flink shell commands tutorial, Classic Wordcount example is explained to run in Flink’s Scala shell/prompt. Count the number of times a specific word has occurred in the dataset-
i. Create DataSet
val data = benv.readTextFile("../about-dataflair.txt ")
ii. FlatMap Operation – Convert line into words
val words = data.flatMap { _.toLowerCase.split(" ") }
iii. Map Operation – Convert word into (word,1) pair
val wordPair = words.map { (_, 1) }
iv. GroupBy – Group similar words together and sum the counts
val result = wordPair.groupBy(0).sum(1)
v. Print the result
result.print
Create your first Apache Flink Application in java with eclipse using this Flink guide.

5. Conclusion – Flink Shell Commands

Hence, in this Apache Flink Shell Commands tutorial, we discussed this with the help of example such ad log mining and Flink wordcount. Still, if you have any query regarding Apache Flink Shell Commands, ask in the comment tab.

Reference for Flink

No Responses

  1. PALANIVEL RAJALINGAM says:

    Good Job….Dear

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.