Learn Python Regex Tutorial – Python Regular Expression Functions

Python course with 57 real-time projects - Learn Python

Python Regular Expression is one of my favourite topics. Let’s delve into this without wasting a moment to learn Python Regex Tutorial.

Here, we will discuss Metacharacters, examples & functions of Python Regex. Along with this, we will cover Python findall, Python multiline.

So, let’s start a short Python Regex Cheat Sheet.

Python Regular Expression / Python Regex

What is the Python Regular Expression (Regex)?

Essentially, a Python regular expression is a sequence of characters, that defines a search pattern.

We can then use this pattern in a string-searching algorithm to “find” or “find and replace” on strings. You would’ve seen this feature in Microsoft Word as well.

In this Python Regex tutorial, we will learn the basics of regular expressions in Python. For this, we will use the ‘re’ module.

Let’s import it before we begin.

>>> import re

Python Regex – Metacharacters

Each character in a Python Regex is either a metacharacter or a regular character. A metacharacter has a special meaning, while a regular character matches itself.

Python has the following metacharacters:

Metacharacter	Description
^	Matches the start of the string
.	Matches a single character, except a newline But when used inside square brackets, a dot is matched
[ ]	A bracket expression matches a single character from the ones inside it [abc] matches ‘a’, ‘b’, and ‘c’ [a-z] matches characters from ‘a’ to ‘z’ [a-cx-z] matches ‘a’, ’b’, ’c’, ’x’, ’y’, and ‘z’
[^ ]	Matches a single character from those except the ones mentioned in the brackets[^abc] matches all characters except ‘a’, ‘b’ and ‘c’
( )	Parentheses define a marked subexpression, also called a block, or a capturing group
\t, \n, \r, \f	Tab, newline, return, form feed
*	Matches the preceding character zero or more times abc matches ‘ac’, ‘abc’, ‘abbc’, and so on [ab] matches ‘’, ‘a’, ‘b’, ‘ab’, ‘ba’, ‘aba’, and so on (ab)* matches ‘’, ‘ab’, ‘abab’, ‘ababab’, and so on
{m,n}	Matches the preceding character minimum m times, and maximum n times a{2,4} matches ‘aa’, ‘aaa’, and ‘aaaa’
{m}	Matches the preceding character exactly m times
?	Matches the preceding character zero or one times ab?c matches ‘ac’ or ‘abc’
+	Matches the preceding character one or one times ab+c matches ‘abc’, ‘abbc’, ‘abbbc’, and so on, but not ‘ac’
\|	The choice operator matches either the expression before it, or the one after abc\|def matches ‘abc’ or ‘def’
\w	Matches a word character (a-zA-Z0-9) \W matches single non-word characters
\b	Matches the boundary between word and non-word characters
\s	Matches a single whitespace character \S matches a single non-whitespace character
\d	Matches a single decimal digit character (0-9)
\	A single backslash inhibits a character’s specialness Examples- \. \\ \* When unsure if a character has a special meaning, put a \ before it: \@
$	A dollar matches the end of the string

A raw string literal does not handle backslashes in any special way. For this, prepend an ‘r’ before the pattern.

Without this, you may have to use ‘\\\\’ for a single backslash character. But with this, you only need r’\’.

Regular characters match themselves.

Rules for a Match

So, how does this work? The following rules must be met:

The search scans the string start to end.
The whole pattern must match, but not necessarily the whole string.
The search stops at the first match.

If a match is found, the group() method returns the matching phrase. If not, it returns None.

>>> print(re.search('na','no'))

Output

None

Let’s look at about a couple important functions now.

Python Regular Expression Functions

We have a few functions to help us use Python regex.

1. match()

match() takes two arguments- a pattern and a string. If they match, it returns the string. Else, it returns None.

Let’s take a few Python regular expression match examples.

>>> print(re.match('center','centre'))

Output

None

>>> print(re.match('...\w\we','centre'))

Output

<_sre.SRE_Match object; span=(0, 6), match=’centre’>

2. search()

search(), like match(), takes two arguments- the pattern and the string to be searched.

Let’s take a few examples.

>>> match=re.search('aa?yushi','ayushi')
>>> match.group()

Output

‘ayushi’

>>> match=re.search('aa?yushi?','ayush ayushi')
>>> match.group()

Output

‘ayush’

>>> match=re.search('\w*end','Hey! What are your plans for the weekend?')
>>> match.group()

Output

‘weekend’

>>> match=re.search('^\w*end','Hey! What are your plans for the weekend?')
>>> match.group()

Output

Traceback (most recent call last):File “<pyshell#337>”, line 1, in <module>

match.group()

AttributeError: ‘NoneType’ object has no attribute ‘group’

Here, an AttributeError raised because it found no match. This is because we specified that this pattern should be at the beginning of the string.

Let’s try searching for space.

>>> match=re.search('i\sS','Ayushi Sharma')
>>> match.group()

Output

‘i S’

>>> match=re.search('\w+c{2}\w*','Occam\'s Razor')
>>> match.group()

Output

‘Occam’

It really will take some practice to get it into habit what the metacharacters mean.

But since we don’t have so many, this will hardly take an hour.

Python Regex Examples

Let’s try crafting a Python regex for an email address. Hmm, so what does one look like? It looks like this: [email protected]

Let’s try the following code:

>>> match=re.search(r'[\w.-]+@[\w-]+\.[\w]+','Please mail it to [email protected]')
>>> match.group()

Output

‘[email protected]’It worked perfectly!

Here, if you would have typed [\w-.] instead of [\w.-], it would have raised the following error:

>>> match=re.search(r'[\w-.]+@[\w-]+\.[\w]+','Please mail it to [email protected]')

Output

Traceback (most recent call last):File “<pyshell#347>”, line 1, in <module>

match=re.search(r'[\w-.]+@[\w-]+\.[\w]+’,’Please mail it to [email protected]’)

File “C:\Users\lifei\AppData\Local\Programs\Python\Python36-32\lib\re.py”, line 182, in search
return _compile(pattern, flags).search(string)

File “C:\Users\lifei\AppData\Local\Programs\Python\Python36-32\lib\re.py”, line 301, in _compile

p = sre_compile.compile(pattern, flags)

File “C:\Users\lifei\AppData\Local\Programs\Python\Python36-32\lib\sre_compile.py”, line 562, in compile

p = sre_parse.parse(p, flags)

File “C:\Users\lifei\AppData\Local\Programs\Python\Python36-32\lib\sre_parse.py”, line 856, in parse

p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, False)

File “C:\Users\lifei\AppData\Local\Programs\Python\Python36-32\lib\sre_parse.py”, line 415, in _parse_sub

itemsappend(_parse(source, state, verbose))

File “C:\Users\lifei\AppData\Local\Programs\Python\Python36-32\lib\sre_parse.py”, line 547, in _parse

raise source.error(msg, len(this) + 1 + len(that))

sre_constants.error: bad character range \w-. at position 1

This is because normally, we use a dash (-) to indicate a range.

Group Extraction

Let’s continue with the example on emails. What if you only want the username?

For this, you can provide an argument(like an index) to the group() method.

Take a look at this:

>>> match=re.search(r'([\w.-]+)@([\w-]+)\.([\w]+)','Please mail it to [email protected]')
>>> match.group()

Output

‘[email protected]’

>>> match.group(1)

Output

‘ayushiwasthere’

>>> match.group(2)

Output

‘gmail’

>>> match.group(3)

Output

‘com’

Parentheses let you extract the parts you want. Note that for this, we divided the pattern into groups using parentheses:

r'([\w.-]+)@([\w-]+)\.([\w]+)’

Python findall()

Above, we saw that Python regex search() stops at the first match.

But Python findall() returns a list of all matches found.

>>> match=re.findall(r'advi[cs]e','I could advise you on your poem, but you would disparage my advice')

We can then iterate on it.

>>> for i in match:
     print(i)

Output

advise
advice

>>> type(match)

Output

findall() with Files

We have worked with files, and we know how to read and write them. Why not make life easier by using Python findall() with files?

We’ll first use the os module to get to the desktop. Let’s see.

>>> import os
>>> os.chdir('C:\\Users\\lifei\\Desktop')
>>> f=open('Today.txt')

We have a file called Today.txt on our Desktop. These are its contents:

OS, DBMS, DS, ADA

HTML, CSS, jQuery, JavaScript

Python, C++, Java

This sem’s subjects

Now, let’s call findall().

>>> match=re.findall(r'Java[\w]*',f.read())

Finally, let’s iterate on it.

>>> for i in match:
      print(i)

Output

JavaScript
Java

findall() with Groups

We saw how we can divide a pattern into groups using parentheses. Watch what happens when we call Python Regex findall().

>>> match=re.findall(r'([\w]+)\s([\w]+)','Ayushi Sharma, Fluffy Sharma, Leo Sharma, Candy Sharma')
>>> for i in match:
   print(i)

Output

(‘Ayushi’, ‘Sharma’)

(‘Fluffy’, ‘Sharma’)

(‘Leo’, ‘Sharma’)

(‘Candy’, ‘Sharma’)

Python Regex Options

The functions we discussed may take an optional argument as well. These options are:

1. Python Regular Expression IGNORECASE

This Python Regex ignore case ignores the case while matching.

Take this example of Python Regex IGNORECASE:

>>> match=re.findall(r'hi','Hi, did you ship it, Hillary?',re.IGNORECASE)
>>> for i in match:
      print(i)

Output

Hihi

2. Python MULTILINE

Working with a string of multiple lines, this allows ^ and $ to match the start and end of each line, not just the whole string.

>>> match=re.findall(r'^Hi','Hi, did you ship it, Hillary?\nNo, I didn\'t, but Hi',re.MULTILINE)
>>> for i in match:
      print(i)

Output

3. Python DOTALL

.* does not scan everything in a multiline string; it only matches the first line. This is because . does not match a newline.

To allow this, we use DOTALL.

>>> match=re.findall(r'.*','Hi, did you ship it, Hillary?\nNo, I didn\'t, but Hi',re.DOTALL)
>>> for i in match:
     print(i)

Output

Hi, did you ship it, Hillary?No, I didn’t, but Hi

Greedy vs Non-Greedy

The metacharacters *, +, and ? are greedy. This means that they keep searching. Let’s take an example.

>>> match=re.findall(r'(<.*>)','<em>Strong</em> <i>Italic</i>')
>>> for i in match:
     print(i)

Output

This gave us the whole string, because it greedily keeps searching. What if we just want the opening and closing tags? Look:

print(i)

>>> match=re.findall(r'(<.*?>)','<em>Strong</em> <i>Italic</i>')
>>> for i in match:
       print(i)

Output

The .* is greedy, and the ? makes it non-greedy.

Alternatively, we could also do this:

>>> match=re.findall(r'</?\w+>','<em>Strong</em> <i>Italic</i>')
>>> for i in match:
     print(i)

Output

Here’s another example:

>>> match=re.findall('(a*?)b','aaabbc')
>>> for i in match:
     print(i)

Output

aaa

Here, the ? makes * non-greedy. Also, if we would have skipped the b after the ?, it would have returned an empty string.

The ? here needs a character after it to stop at. This works for all three- *?, +?, and ??.

Similarly, {m,n}? makes it non-greedy, and matches as few occurrences as possible.

Substitution

We can use the sub() function to substitute the part of a string with another. sub() takes three arguments- pattern, substring, and string.

>>> re.sub('^a','an','a apple')

Output

‘an apple’

Here, we used ^ so it won’t change apple to anpple. The grammar police approve.

Python Regex Applications

So, we learned so much about Python regular expressions, but where do we use them? They find use in these places:

Search engines
Find and Replace dialogues of word processor and text editors
Text processing utilities like sed and AWK
Lexical analysis

This was all about the Python Regex Tutorial

Python Interview Questions on Regular Expressions

What is regular expression in Python? Explain with example.
How to use regular expression in Python?
What is the meaning of question mark in regular expression in Python?
How to split a regular expression in Python?
How to check if a regular expression is in Python?

Conclusion

These were the basics of Python regular expressions. Honestly, we think it is really cool to have such a tool in hand.

If you love English, try experimenting, and make a small project with it.

If you are Happy with DataFlair, do not forget to make us happy with your positive feedback on Google

Tags: Python 3 regex Python findall python multiline Python Regex Python Regex Cheat sheet Python Regex example Python regex online Python Regex Tutorial Python Regular Expressions

Aman says:
February 12, 2020 at 3:08 pm
In Greedy and Non Greedy part, please check your examples. While explaining the foundation of Greedy concept, your first example does not align to the concept.
Reply
- DataFlair Team says:
  August 29, 2023 at 3:03 pm
  We really appreciate your observation, we have noted your opinion and we will be making the necessary changes shortly. Thanks a lot aman for the feedback.
  Reply
ubant says:
February 7, 2021 at 8:15 pm
Great, I finally understand it!
Reply

Learn Python Regex Tutorial – Python Regular Expression Functions

What is the Python Regular Expression (Regex)?

Python Regex – Metacharacters

Rules for a Match

Python Regular Expression Functions

1. match()

2. search()

Python Regex Examples

Group Extraction

Python findall()

findall() with Files

findall() with Groups

Python Regex Options

1. Python Regular Expression IGNORECASE

2. Python MULTILINE

3. Python DOTALL

Greedy vs Non-Greedy

Substitution

Python Regex Applications

Python Interview Questions on Regular Expressions

Conclusion

3 Responses

Leave a Reply Cancel reply

About DataFlair

Trending Courses

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Data Science Tutorials

Trending Projects

Trending Programming Tutorials

Trending Tutorials