Gawk Command in Linux
Placement-ready Courses: Enroll Now, Thank us Later!
In this article, you will learn about the gawk command in Linux-based operating systems. We will look at the gawk command, why it is used, how to install it, its syntax, and the options used along with it. We will also look at examples of the gawk command by pairing it with various options to understand its working. So pay attention, take notes, and read to the end for the best benefits.
What is gawk command in linux?
Gawk is a command line-based utility in Linux-based operating systems used for pattern scanning and processing language. Gawk needs no compiling as it allows the user to use variables, numeric functions, string functions, logical operators, and more.
Just like the awk command, gawk also enables programmers to write tiny and effective programs in the form of statements that define text patterns to be searched for in a text document and the action to be taken when a match is found within a line.
The gawk command may seem simple, but it is capable of many things. For example, it can scan files line by line, split inputs, transform data files, and produce formatted outputs. It can also handle arithmetic operations, string operations, conditional statements, loops, and many more.
How does linux gawk work?
The primary purpose of the gawk command is to make text manipulation and information retrieval an easy job in Linux distributions. The command works by scanning a set of input lines and then searching for the lines that match the pattern specified by the user.
The gawk command accepts input data, which is transformed and sent to the standard output. For each pattern gawk recognises, the user can describe an action on each line. Gawk can easily process complex log files and give a readable output.
What is the syntax of linux gawk command?
The syntax of the gawk command might look slightly intimidating at first, but once we understand the fields in its syntax, it becomes a cakewalk. The syntax of the gawk command is as shown below.
gawk <options> -f <program file>
Let us see the fields present in gawk command syntax.
1. <options>
This field takes in a range of options that specify how the gawk command must function, format, and print the output. You can either write the option in POSIX or GNU styles.
2. <program file>
It takes in the file’s name where you have written the program. To do so, you must use the option “-f”. You can also write your program in one line without using a program file.
Options used with linux gawk command
Compared to most Linux commands, the number of options used with the gawk command is considerably small. Let us take a brief look at each one of them.
1. -f
As we discussed above, this option reads the AWK program source from the file you specified instead of from the first command line argument. You can also write this option as “–file.”
2. -F
This option uses “f” s for the input field separator. This option can also be written as “–field-separator.”
3. -v
Before the execution of the program begins, this option assigns the value you specified to the variable you chose. This option can also be written as “–assign.”
4. -b
This option treats all input data as single-byte characters. You can also write this option as “–characters-as-bytes.”
5. -c
This option runs the gawk command in compatibility mode, where gawk behaves precisely like the awk command. This option can also be written as “–traditional.”
6. -d
This option prints a sorted list of global variables, their types, and final values to the file you specified. This option can also be written as “–dump-variables.”
7. -C
This option prints the short version of the GNU Copyright information message, as shown below.
8. -e
This option allows the easy intermixing of library functions with source code entered on the command programs used in shell scripts. You can also write this option as “–source”
9. -g
This option scans and parses the AWK program, and generates a GNU .pot (Portable Object Template) format file on standard output with entries for all localizable strings in the program.
10. -L
This option provides warnings about constructs that are dubious or non-portable to other AWK implementations. This option can also be written as “–lint”
11. -n
This option recognizes octal and hexadecimal values in input data. We can also write this as “–non-decimal-data”
12. –help
This option displays the help menu of the gawk command as shown below:
13. -O
This option enables optimisations upon the internal representation of the program. This option can also be written as “optimise.”
14. -r
This option enables the use of interval expressions in regular expression matching. You can also write this option as –re-interval.
15. -N
This option forces the gawk command to use the locale’s decimal point character when parsing input data. You can also write this option as “–use-lc-numeric.”
16. -V
This option displays the version of the gawk command you are using.
Default behaviour of linux gawk command
Let us consider the text file shown below, which contains five names along with phone numbers:
Now, if we use the gawk command shown below, it will print the file’s contents:
gawk '{print}' <filename>.txt.
Printing lines that match a specific pattern
Like the grep command, we can also use the gawk command to print out the lines that match the specific pattern. To do so, execute the gawk command by using the syntax shown below:
gawk '/<pattern>/ {print}' <filename>
Printing only a specific column of the file
To print only a specific column of a file, use the command shown below:
gawk '{print $<column number>}' <filename>.txt
Displaying the count of the lines
If you want to display the count of the lines on the left-hand side, you can use the gawk command as shown below:
gawk '{print NR, $0}' <filename>.txt
Finding the length of the longest line present in the file
If you want to find out the length of the longest line that is present in a file, you can use the command shown below:
gawk '{ if (length($0) > max) max = length($0) } END { print max }' <filename>.txt
Counting the number of lines in the file
To count the number of lines in a file, use the following command:
gawk' END { print NR }' <filename>.txt.
Printing lines with more than a specific characters
If you want to print the lines that contain more than the specified number of characters, you can make use of the command shown below:
gawk 'length($0) > <number of characters>' <filename>.txt
Built-in variables of linux gawk command
The gawk command also has a couple of in-built variables used for different purposes. Let us take a brief look at them, along with an example.
1. NR: It keeps the current count of the input line number.
Example:
gawk '{print NR "-" $1 }' mobile.txt
2. NF: It counts the number of fields within the current input record.
3. FS: It contains the field separator character, which divides fields on the input line.
Example:
gawk 'BEGIN{FS=":"; RS="-"} {print $1, $6, $7}' /etc/passwd
4. RS: It stores the current record separator character.
5. ORS: It stores the output record separator, which separates the output lines when Awk prints them.
6. OFS: It stores the output field separator, which separates the fields when Awk prints them.
Example:
gawk 'BEGIN{FS=":"; OFS="-"} {print $1, $6, $7}' /etc/passwd
Summary
As you have seen, the gawk command is a simple yet slightly complicated Linux utility used for pattern scanning and language processing. It allows programmers to write tiny and effective programs in statement forms that define text patterns to be searched.
Did we exceed your expectations?
If Yes, share your valuable feedback on Google