Regular Expression in Linux
Expert-led Online Courses: Elevate Your Skills, Get ready for Future - Enroll Now!
In this article, you will learn all there is to shell programming in Linux-based operating systems. We will go through what regex is, why we use them, what are the different regular expression in linux, the types of regex, and examples of each in detail. So sit down, grab a snack, and read right till the end!
What is Linux Regex?
First of all, regex is the abbreviation for “Regular Expressions”. However, they are not as regular as they sound! Regular expressions are special characters in Linux-based operating systems that help us search data and match a complex pattern.
Regular expressions in linux are most commonly used with commands like grep, sed, tr, ed, awk, and vi. However, we will be focusing on the usage of regular expressions with the grep command in this article, not to mention that there will be a few honorable mentions of other commands as well!
Regex is a really powerful command-line-based tool that helps in describing several sequences of characters. REGEX is also called as REGEXP.
Types of Regular Expressions in Linux
In this article let us take divide the regular expressions into the following 3 types while trying the understand REGEX. We will also look at how to use each expression in the terminal to. The 3 types of REGEX are:
1. Basic regular expressions
2. Interval regular expressions
3. Entended regular expressions
Let us now take a closer look at each of these types:
1. Basic regular expressions
Before we look at the practical examples of the basic regular expressions, let us cumulatively look at the list of the basic regular expressions:
a. .
This basic regular expression replaces any character.
b. ^
This basic regular expression matches the start of the string.
c. $
This basic regular expression matches the end of the string.
d. *
This basic regular expression matches up zero or more times the preceding character.
e. \
This basic regular expression represents special characters.
f. ()
This basic regular expression groups regular expressions.
g. ?
This basic regular expression matches up with exactly one character
Let us now look at an example for each of the basic regexes:
We know that to use regular expressions we need a text file, for the sake of an example, let us consider the file “fruits.txt” that contains a very long list of fruits!
a. Using ‘dot’ to match string
Using the dot (.) expression, we can try to find a string even if we don’t know the full string. We can use the dot expression in places of the character we don’t know.
In the above example, even though we specified the dot expression in the place of “l” in the text “Apple”, the command gave us the lines that contain the text “Apple”.
b. Using ‘caret’ to match the beginning of the string
We can use the caret (^) expression to search for lines that begin with the specified text.
The above command gave us all the strings beginning with the letter “B”.
c. Using ‘dollar’ to match the end of the string
Just like we have the expression “^” to match the beginning of a string we also have the expression “$” to search to match the end of the string.
The command in the example above prints all the strings that end with the letter “e”.
d. Using ‘asterisk’ to find the repetition of a letter
Use the asterisk (*) expression to match a repetition of a letter in a word. You can print the repetitions all the way from zero to infinite!
The command in the example prints the words that match the contain the text “Aple” and have any repetitions of the letter “p”, meaning that it will even print out strings like “Appple”, “Appppppple”, Appppppppppple” and so on if they exist.
e. Using ‘backslash’ to match a special symbol
If we want to search for special characters like semicolon (;), colon (:), slashes(/), comma (,) and many more, we use the expression ‘backslash’. We specify the special character you want to search for after the backslash expression.
The command shown in the above screenshot displays all the strings that have a space in them.
f. Using ‘braces’ to match a group of regexp
If we simply want to search for a piece of text in a file, we use the bracket expressions and specify the word we want to search for in them. It must be noted that while using the braces expression with the grep command, we must make use of the option “-E” which is an extended regular expression.
In the above screenshot, the command prints out all the lines with the text “fruit” in them.
g. Using ‘?’ to print all the matching characters
If you want to print out the lines that contain either one of the characters you specify or all of the characters you specify.
The command shown in the above screenshot prints all the lines that either start with “c”, or “ch”. However, if we run the exact same command but without the “?” expression, we will get the line that starts with “Ch” as shown:
2. Interval regular expressions
These expressions print out the lines that match the occurrence of the character or characters we specify. These are more sophisticated yet simple, let us look at them:
a. {n}
This interval regular expression matches the preceding characters that appear exactly “n” number of times.
b. {n,m}
This interval regular expression matches the preceding character that appears exactly “n” number of times but not more than “m”, meaning it prints repetitions of the character between “n” to “m” number of times.
c. {n,}
This interval regular expression matches the preceding character that appears “n” number of times or more.
Let us now look at an example for each of the 3 interval regular expressions along with the grep command:
a. Using the “{n}” expression
In the command shown below, we used the expression {n} to search for words that have 2 occurrences of the character “p”.
b. Using the “{n,m}” expression
In the command shown below, we used the expression {n,m} to search for words that have at least 1 occurrence of “p” and at most 2 occurrences of “p”
c. Using the “{n,}” expression
In the command shown below, we used the expression {n,} to search for words that have the character “p” at least twice.
3. Extended regular expressions
These expressions help us in finding text where a pattern of string either precedes or succeeds another piece of string. The following are the extended regular expressions:
a. \+
This extended regular expression matches one or more occurrences of the previous character.
b. \?
This extended regular expression matches zero or more occurrences of the previous character.
Let us look at an example for each of the 2 extended regular expressions:
a. Using “\+”
The command in the screenshot below prints all of the occurrences of the cases where the character “t” is preceded by the character “a”.
b. Using “\?”
The command in the screenshot below prints all of the occurrences of the cases where the character “t” is preceded by the character “a” and also where only the character “t” is present.
Brace expansion in Linux
Here is a bonus example of a regular expression – {}. Using brace expansion we can specify a range of things to perform operations on, here are some examples:
One such real-life example of brace expansion is when downloading a continuous range of websites using the wget command:
We can use expressions along with many other multiple commands also.
Table of metacharacters
Even though we have used most of the metacharacter, let us look at everything in one table to get s better picture:
NO | EXPRESSION | DESCRIPTION |
1 | . | This metacharacter replaces any character. |
2 | ^ | This metacharacter matches the start of the string and represents characters not in the string. |
3 | $ | This metacharacter matches the end of the string. |
4 | * | This metacharacter Matches zero or more times the preceding character. |
5 | \ | This metacharacter represents the group of characters. |
6 | () | This metacharacter Group regular expressions. |
7 | ? | This metacharacter Matches exactly one character. |
8 | + | This metacharacter matches one or more times the preceding character. |
9 | {N} | Preceding character is matched exactly N times. |
10 | {N,} | Preceding character is matched exactly N times or more. |
11 | {N, M} | Preceding character is matched exactly N times, but not more than N times. |
12 | – | This metacharacter represents the range. |
13 | \b | This metacharacter matches the empty string at the edge of a word. |
15 | \B | This metacharacter matches the empty string if it is not at the edge of a word. |
16 | \< | This metacharacter matches the empty string at the beginning of a word. |
17 | \> | This metacharacter matches the empty string at the end of a word. |
Shell scripting using Regular Expression in Linux
We can also use regular expressions in shell scripting, here are some examples:
1. Using “^” in shell scripting
Here is a shell program to print words starting the with letter “B”.
2. Using “*” in shell scripting
Here is a shell program to print words having occurrences of “ap” in them.
3. Using “?” in shell scripting
Here is a shell program to print words having occurrences of “ch” in them
Summary
As you have seen, regular expressions are a simple set of operators that make life so much easier as they improve th efficiency of workflow. You have now learned what operators are, why they are used and the types of operators, where we covered the basic, interval, and extended types of regular expressions along with examples.
Did you like this article? If Yes, please give DataFlair 5 Stars on Google