AWK Command in Linux

FREE Online Courses: Click, Learn, Succeed, Start Now!

In this article, you will learn all there is to the AWK command in Linux-based operating systems. We will go through what the awk command is, why it is used, how it works, syntax, options, and the built-in variables of the awk command.

We will also look into the technical side of the awk command as we go through awk patterns, awk statements, awk variables, awk actions, and finally some practical commands of the awk command in the terminal. In this article, we will look into some of the coolest features of the awk command, so pay attention and read right to the end!

What is AWK?

Awk is not a command-line-based utility, instead, it is a scripting language used for manipulating and generating reports. The awk language, unlike c programing, needs no compiling. AWK is the abbreviation of the names of the developers – Aho, Weinberger, and Kernighan.

Awk is designed for advanced text processing and is mostly used as a tool for reporting and analysis. Since awk is data-driven, you can define a set of actions to be performed against the input text. Awk takes in and transforms the input data, and sends it to stdout (standard output).

The awk scripting language is used very commonly by programmers, as they write scaled-down programs in the form of a statement that defines text designs and patterns.

How does awk work?

The primary purpose of the awk command is to make text manipulation and information retrieval an easy job in Linux distributions. The command works by scanning a set of input lines and then searches for the lines that match the pattern specified by the user.

The awk command accepts input data, which gets transformed and sent out to the standard output. For each pattern that awk recognizes, the user can describe an action to perform on each line. Awk can easily process complex log files and output a readable output.

Why use awk?

As we have seen awk is used for processing and manipulating text. It enables a programmer to write tine program in the form of a statement where he or she can specify text patterns that are searched for in each line of a file.

Awk also has the ability to search more than one file to see if they contain the lines that match the specified pattern and perform the defined actions.

Let us look at what we can do with the awk command:

1. Operations capable of AWK

a. Scan files line by line.

b. Divides each input line into fields.

c. awk compares the fields of the input lines to patterns

d. On successful matching, awk can perform specified actions.

2. What is awk useful for?

a. Transform data files

b. Produce formatted reports

c. processing text

d. Manipulating text

3. Programming constructs

a. Producing formatted output lines

b. Performing arithmetic and string operations

c. loops and conditionals

Syntax of the awk command

When you first look at the syntax of the awk command, it may look a little intimidating, but once you understand what each field is for, and see a few examples, you will get the hang of it. The syntax for th awk command is:

awk <options> <’selection criteria <action>’ input filename> output file

Let us look at each field in the syntax briefly:

1. Options

This field takes in the options available with the awk command, which we will look into in the next section.

2. Selection criteria

This field specifies the pattern of string we are searching for.

3. Action

This field specifies the action that has to be carried out if it found the match

4. Input filename

This field specifies the name of the file you want the search of the string to happen in.

5. Input filename

This field specifies the name of the files you want the formatted or manipulated print in.

Options used with the awk command

We have seen that in the syntax of the awk command, there is a field for entering options. Options specify how awk should work or how the output should be formatted. Let us look into the option available with the awk command.

1. -f <program file>

This option reads the awk program source from the file specified instead of the first command-line argument.

2. -F <fs>

This option uses fs for the input field separator.

Fields and records

In earlier sections, we were speaking about awk dividing input lines into fields. Let us look at what files and records are in slightly greater depth. We saw that awk processes text files and streams, and divides this input data into records and files.

Awk operators only one record at a time. These records are separated by a character called separators and the default separator is a new line. This means that each line in the text file is a record.

These records consist of fields. Just like records, fields are also separated by a separator, except in this case the default separator is a space. Fields in each record are referred using a dollar sign ($) Followed by the filed number starting with 1. In layman’s language, a record is a sentence and a field is a word.

Statements of the AWK command

Just like every programming language, awk also has the standard loops like if-else, while, for, and many more, in this section we will be looking at each of them in detail.

If -else statement

The if-else statement works just like you expect it, if a statement is true it will perform a specific set of instructions and if it is false, it will perform another set of instructions.

Let us write a simple program in one line: to check if the second field and 3rd field of each record are the same, if they are we will print what they are and if not we will print a message saying they are not equal.

To write the program as expected as above, use the command shown below:

awk -F ‘,’ ‘{if($2==$3){print $1″,”$2″,”$3} else {print “Not equal”}}’ if .txt

To execute this program, we obviously need a sample text file, for the time being, let us consider the following text file:

sample text file

And upon running the command above, we get the expected output.

linux awk command output

While loop

The while loop also works just like anticipated, it performs a specific task until the specified condition is false. Let us write a simple while loop to break down the field in every record, and also precede it with a number. To do so, use the command shown below:

awk '{i=0; while(i<=NF) { print i ":"$i; i++;}}' if.txt

For this statement also let us take the text file we used in the previous example.

sample text file

Upon running the while loop, we get the expected output:

while loop with the awk command

For loop

The for loop is exactly the same as the while loop, it performs a specific set of actions only as long as the condition is true. Let us write a simple program for printing the squares of numbers till 10 using for loop. To do so, use the command shown below:

awk 'BEGIN{for(i=1; i<=10; i++) print "The square of", i, "is", i*i;}'

This specific example does not need any text file to work on as we are directly printing in on the terminal and have no input data. Upon running the command, we get the following output:

for loop with the awk command

Break statement

The break statement is used to exit out of a loop with a specific condition met. For example, let us write a code to print the word “ubuntu” only 5 times (using the break statement), which means the 6th time the command goes through the loop, it should break away from it. To do the following, use the command shown below:

awk 'BEGIN{x=1; while(1) {print "Example"; if ( x==6 ) break; x++; }}'

Even this example does not need any text file to work on as we are directly printing in on the terminal and have no input data. Upon running the command, we get the following output:

break statement with the awk command

Patterns of AWK

You can insert a pattern in front of the awk command so that it acts as a selector. The selector determines whether to perform the specified action or not. Let us look at the different types of patterns

1. Regular expressions

2. Arithmetic and relational expressions

3. String valued expressions

4. Arbitrary boolean combinations of the above expressions.

Let us look at each expression in detail.

Regular expressions using linux AWK Command

These expressions are the simplest form of expressions containing a string of characters enclosed in slashes (/). For example, let us write a command to print out the letters starting with the letter capital ‘A’ in a specific text file. To do so type the command shown below in the terminal:

awk '$1 ~ /^A/ {print $0}' expression.txt

Upon running the command, we get the expected output: All the content starting with the letter “A” gets printed out.

regular expressions of awk

Relational expressions using AWK Linux

As the name suggests relational expressions relate 2 quantities, they include:

  • < – less than
  • <= – lessa than or equal to
  • == equal to
  • != – not equal to
  • = greater than or equal to
  • > – greater than

Let us write a program to check if 2 numbers are equal or not, to do so, type the command shown below in the terminal:

awk 'BEGIN { a = 10; b = 10; if (a == b) print "a is equal ti b" }'

Since the numbers are equal the command prints “a is equal to b”:

relational expressions using awk

Range patterns using Linux AWK

Another type of pattern is the range pattern, which consists of 2 comma-separated patterns. The range expression performs the specified actions for each record between th occurrence of pattern one and pattern 2.

For example, the command shown below will display the word which starts with “clerk” and ends with “manager”, not the other way round:

awk '/clerk/, /manager/ {print $1, $2}' expression.txt

range patterns using awk

Special expression patterns using AWK

These special expression patterns include “BEGIN” and “END” which denote what to print at the beginning and end of the specified file. For example, the command shown below will print the message “list of employees” at the beginning and “end of list” at the end:

awk 'BEGIN { print "List of employees:" }; {print $1, $2}; END {print "End of list"}' expression.txt

special expression patterns using awk

Combining patterns

We cal also combine 2 0r more expressions using the following connectors:

  • || – or
  • && – and
  • ! – not

AWK variables

The awk command has many built-in variables, these variables denote the fields in the records, for example:

  • $0 – represents the whole line
  • $1 – represents the first word of a line
  • $2 – represents the second word of a line

We have already used the above variables in the above examples, let us look at some other new variables in this section:

1. NR

This variable counts the number of input lines For example if we have to print the whole line along with its line number, we can use the command:

awk '{print NR,$0}' expression.txt

using nr variable of awk

2. NF

The “NF” variable counts the number of words in th current input record and displays the last wors of the file

For example, the command shown below will print the last word of every line in the text file “expression”:

awk '{print $NF}' expression.txt

using nf variable of awk

3. FS

The “FS” variable contains the character used to divide words on the input line. The default separator of words is a space, but you can use the “FS” variable to change the separator to another character.

using fs variable of awk

4. RS

The “RS” variable stores the current record separator character. The default record separator is a new line.

5. OFS

The “OFS” variable stores the output field separator.

For example, the command shown below will separate the 2nd word and 1st word by the text “is a”.

awk 'OFS=" is a " {print $2,$1}' expression.txt

using ofs variable

6. FNR

This variable contains the ordinal number of the current record in the current file.

7. FILENAME

This variable contains the name of the current input file.

8. ORS

This variable contains the output record separator, just like OFS word separation, ORS is for line separation.

9. OFMT

This variable contains the output format for numbers.

9. SUBSEP

This variable contains the Character to separate multiple subscripts.

10. ARGC

This variable contains the count of the arguments.

AWK actions

We have already seen that the syntax of awk has a field that accepts the actions. We can also just specify the action without manipulating the text.

For example, the command shown below will simply print the message “This is an action” on the terminal:

awk '{print "This is an action"}'

awk actions

The command will execute every time you hit “enter”, terminate it press “ctrl” + “D”.

Built-in functions of the AWK command

The AWK command comes with a lot of built-in functions like sqrt and atan2:

awk ‘BEGIN { print sqrt(625)}’

awk ‘BEGIN { print sqrt((2+3)*5)}’

awk ‘BEGIN {print atan2(0, -1)}’

awk ‘BEGIN {print atan2(0, -1)*100}’

built in functions of awk command

The “BEGIN” and “END” rules

The ‘BEGIN’ rule is executed at the very beginning of the awk command. In fact, it is the first thing to even be executed, even before awk processes the input text.

The ‘END’ rule is executed after all the processes have been completed. You can have multiple ‘BEGIN’ and ‘END’ rules in an awk command, and they will execute in the same order.

Scripting using awk

Instead of simply writing single line command, we can use awk to create a script program that can later be run

Let us write a simple script program and save it as a “.awk” file:

writing a script program using the awk command

After you create the file, add the permission to execute it by using the command “chmod +x <filename>”

giving the file execute permissions

Then execute the file by using th command: ./<name>.awk /etc/passwd

output of the shell program

We passed the /etc/passwd directory as an argument, it will print the number of accounts after successful execution just as anticipated.

Using all of this basic knowledge, we can use the awk command to do different tasks like:

Filtering using Linux AWK

awk 'length($0) > 8' /etc/shells

filtering using the awk command

In the above output, we passed the /etc/shells system file as an argument and filtered the output to contain only the lines containing more than 8 characters.

Counting using Linux AWK Command

awk '{ print "The number of characters in line", NR,"=" length($0) }' employees.txt

counting using the awk command

In the above output, we counted the number of characters per each line using the NR variable.

Apart from these, we can perform many tasks using the AWK command.

Summary

As you have seen, linux awk command is a really beautiful command that lets you do complex tasks in just 1 line of code! It is not necessarily simple, but it helps very much in the efficiency of workflow as it can easily process and manipulate texts of large directories, logs, and files.

You have now learned what AWK is, why we use it, how it works, the syntax, and the options of the AWK command. We have also seen the technical part of it as we went through the statements (if-else, while, for, break), patterns, variables, actions, and how to script using the AWK command.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *