Site icon DataFlair

Regular Expression in Linux

regular expression in linux

FREE Online Courses: Dive into Knowledge for Free. Learn More!

In this article, you will learn all there is to shell programming in Linux-based operating systems. We will go through what regex is, why we use them, what are the different regular expression in linux, the types of regex, and examples of each in detail. So sit down, grab a snack, and read right till the end!

What is Linux Regex?

First of all, regex is the abbreviation for “Regular Expressions”. However, they are not as regular as they sound! Regular expressions are special characters in Linux-based operating systems that help us search data and match a complex pattern.

Regular expressions in linux are most commonly used with commands like grep, sed, tr, ed, awk, and vi. However, we will be focusing on the usage of regular expressions with the grep command in this article, not to mention that there will be a few honorable mentions of other commands as well!

Regex is a really powerful command-line-based tool that helps in describing several sequences of characters. REGEX is also called as REGEXP.

Types of Regular Expressions in Linux

In this article let us take divide the regular expressions into the following 3 types while trying the understand REGEX. We will also look at how to use each expression in the terminal to. The 3 types of REGEX are:

1. Basic regular expressions

2. Interval regular expressions

3. Entended regular expressions

Let us now take a closer look at each of these types:

1. Basic regular expressions

Technology is evolving rapidly!
Stay updated with DataFlair on WhatsApp!!

Before we look at the practical examples of the basic regular expressions, let us cumulatively look at the list of the basic regular expressions:

a. .

This basic regular expression replaces any character.

b. ^

This basic regular expression matches the start of the string.

c. $

This basic regular expression matches the end of the string.

d. *

This basic regular expression matches up zero or more times the preceding character.

e. \

This basic regular expression represents special characters.

f. ()

This basic regular expression groups regular expressions.

g. ?

This basic regular expression matches up with exactly one character

Let us now look at an example for each of the basic regexes:

We know that to use regular expressions we need a text file, for the sake of an example, let us consider the file “fruits.txt” that contains a very long list of fruits!

a. Using ‘dot’ to match string

Using the dot (.) expression, we can try to find a string even if we don’t know the full string. We can use the dot expression in places of the character we don’t know.

In the above example, even though we specified the dot expression in the place of “l” in the text “Apple”, the command gave us the lines that contain the text “Apple”.

b. Using ‘caret’ to match the beginning of the string

We can use the caret (^) expression to search for lines that begin with the specified text.

The above command gave us all the strings beginning with the letter “B”.

c. Using ‘dollar’ to match the end of the string

Just like we have the expression “^” to match the beginning of a string we also have the expression “$” to search to match the end of the string.

The command in the example above prints all the strings that end with the letter “e”.

d. Using ‘asterisk’ to find the repetition of a letter

Use the asterisk (*) expression to match a repetition of a letter in a word. You can print the repetitions all the way from zero to infinite!

The command in the example prints the words that match the contain the text “Aple” and have any repetitions of the letter “p”, meaning that it will even print out strings like “Appple”, “Appppppple”, Appppppppppple” and so on if they exist.

e. Using ‘backslash’ to match a special symbol

If we want to search for special characters like semicolon (;), colon (:), slashes(/), comma (,) and many more, we use the expression ‘backslash’. We specify the special character you want to search for after the backslash expression.

The command shown in the above screenshot displays all the strings that have a space in them.

f. Using ‘braces’ to match a group of regexp

If we simply want to search for a piece of text in a file, we use the bracket expressions and specify the word we want to search for in them. It must be noted that while using the braces expression with the grep command, we must make use of the option “-E” which is an extended regular expression.

In the above screenshot, the command prints out all the lines with the text “fruit” in them.

g. Using ‘?’ to print all the matching characters

If you want to print out the lines that contain either one of the characters you specify or all of the characters you specify.

The command shown in the above screenshot prints all the lines that either start with “c”, or “ch”. However, if we run the exact same command but without the “?” expression, we will get the line that starts with “Ch” as shown:

2. Interval regular expressions

These expressions print out the lines that match the occurrence of the character or characters we specify. These are more sophisticated yet simple, let us look at them:

a. {n}

This interval regular expression matches the preceding characters that appear exactly “n” number of times.

b. {n,m}

This interval regular expression matches the preceding character that appears exactly “n” number of times but not more than “m”, meaning it prints repetitions of the character between “n” to “m” number of times.

c. {n,}

This interval regular expression matches the preceding character that appears “n” number of times or more.

Let us now look at an example for each of the 3 interval regular expressions along with the grep command:

a. Using the “{n}” expression

In the command shown below, we used the expression {n} to search for words that have 2 occurrences of the character “p”.

b. Using the “{n,m}” expression

In the command shown below, we used the expression {n,m} to search for words that have at least 1 occurrence of “p” and at most 2 occurrences of “p”

c. Using the “{n,}” expression

In the command shown below, we used the expression {n,} to search for words that have the character “p” at least twice.

3. Extended regular expressions

These expressions help us in finding text where a pattern of string either precedes or succeeds another piece of string. The following are the extended regular expressions:

a. \+

This extended regular expression matches one or more occurrences of the previous character.

b. \?

This extended regular expression matches zero or more occurrences of the previous character.

Let us look at an example for each of the 2 extended regular expressions:

a. Using “\+”

The command in the screenshot below prints all of the occurrences of the cases where the character “t” is preceded by the character “a”.

b. Using “\?”

The command in the screenshot below prints all of the occurrences of the cases where the character “t” is preceded by the character “a” and also where only the character “t” is present.

Brace expansion in Linux

Here is a bonus example of a regular expression – {}. Using brace expansion we can specify a range of things to perform operations on, here are some examples:

One such real-life example of brace expansion is when downloading a continuous range of websites using the wget command:

We can use expressions along with many other multiple commands also.

Table of metacharacters

Even though we have used most of the metacharacter, let us look at everything in one table to get s better picture:

NO EXPRESSION DESCRIPTION
1 . This metacharacter replaces any character.
2 ^ This metacharacter matches the start of the string and represents characters not in the string.
3 $ This metacharacter matches the end of the string.
4 * This metacharacter Matches zero or more times the preceding character.
5 \ This metacharacter represents the group of characters.
6 () This metacharacter Group regular expressions.
7 ? This metacharacter Matches exactly one character.
8 + This metacharacter matches one or more times the preceding character.
9 {N} Preceding character is matched exactly N times.
10 {N,} Preceding character is matched exactly N times or more.
11 {N, M} Preceding character is matched exactly N times, but not more than N times.
12 This metacharacter represents the range.
13 \b This metacharacter matches the empty string at the edge of a word.
15 \B This metacharacter matches the empty string if it is not at the edge of a word.
16 \< This metacharacter matches the empty string at the beginning of a word.
17 \> This metacharacter matches the empty string at the end of a word.

Shell scripting using Regular Expression in Linux

We can also use regular expressions in shell scripting, here are some examples:

1. Using “^” in shell scripting

Here is a shell program to print words starting the with letter “B”.

2. Using “*” in shell scripting

Here is a shell program to print words having occurrences of “ap” in them.

3. Using “?” in shell scripting

Here is a shell program to print words having occurrences of “ch” in them

Summary

As you have seen, regular expressions are a simple set of operators that make life so much easier as they improve th efficiency of workflow. You have now learned what operators are, why they are used and the types of operators, where we covered the basic, interval, and extended types of regular expressions along with examples.

Exit mobile version