Python Regex – Learn Python Regular Expression Functions

Master Python with 70+ Hands-on Projects and Get Job-ready - Learn Python

Python Regular Expression is one of my favourite topics. Let’s delve into this without wasting a moment to learn Python Regex Tutorial.

Here, we will discuss Metacharacters, examples & functions of Python Regex. Along with this, we will cover Python findall, Python multiline.

So, let’s start a short Python Regex Cheat Sheet.

Python Regular Expression / Python Regex

What is the Python Regular Expression (Regex)?

A regular expression, or regex, is a tiny search language baked into Python’s re module. Think of it as a smart magnifying glass that finds patterns in text, not just fixed words.

Want to spot all email addresses in a log file? A well-crafted regex can do it in one pass, saving hours of manual scanning.

We can then use this pattern in a string-searching algorithm to “find” or “find and replace” on strings. You would’ve seen this feature in Microsoft Word as well.

Patterns use symbols like . (any single character), * (zero or more repeats), and [] (character sets). By mixing these, you can build powerful filters such as r”\d{4}-\d{2}-\d{2}” to catch dates like 2025-07-14.

Mastering regex unlocks fast data wrangling in web scraping, log parsing, and form validation.

In this Python Regex tutorial, we will learn the basics of regular expressions in Python. For this, we will use the ‘re’ module.

Let’s import it before we begin.

>>> import re

Python Regex – Metacharacters

Each character in a Python Regex is either a metacharacter or a regular character. A metacharacter has a special meaning, while a regular character matches itself.

Python has the following metacharacters:

Metacharacter	Description
^	Matches the start of the string
.	Matches a single character, except a newline But when used inside square brackets, a dot is matched
[ ]	A bracket expression matches a single character from the ones inside it [abc] matches ‘a’, ‘b’, and ‘c’ [a-z] matches characters from ‘a’ to ‘z’ [a-cx-z] matches ‘a’, ’b’, ’c’, ’x’, ’y’, and ‘z’
[^ ]	Matches a single character from those except the ones mentioned in the brackets[^abc] matches all characters except ‘a’, ‘b’ and ‘c’
( )	Parentheses define a marked subexpression, also called a block, or a capturing group
\t, \n, \r, \f	Tab, newline, return, form feed
*	Matches the preceding character zero or more times abc matches ‘ac’, ‘abc’, ‘abbc’, and so on [ab] matches ‘’, ‘a’, ‘b’, ‘ab’, ‘ba’, ‘aba’, and so on (ab)* matches ‘’, ‘ab’, ‘abab’, ‘ababab’, and so on
{m,n}	Matches the preceding character minimum m times, and maximum n times a{2,4} matches ‘aa’, ‘aaa’, and ‘aaaa’
{m}	Matches the preceding character exactly m times
?	Matches the preceding character zero or one times ab?c matches ‘ac’ or ‘abc’
+	Matches the preceding character one or one times ab+c matches ‘abc’, ‘abbc’, ‘abbbc’, and so on, but not ‘ac’
\|	The choice operator matches either the expression before it, or the one after abc\|def matches ‘abc’ or ‘def’
\w	Matches a word character (a-zA-Z0-9) \W matches single non-word characters
\b	Matches the boundary between word and non-word characters
\s	Matches a single whitespace character \S matches a single non-whitespace character
\d	Matches a single decimal digit character (0-9)
\	A single backslash inhibits a character’s specialness Examples- \. \\ \* When unsure if a character has a special meaning, put a \ before it: \@
$	A dollar matches the end of the string

A raw string literal does not handle backslashes in any special way. For this, prepend an ‘r’ before the pattern.

Without this, you may have to use ‘\\\\’ for a single backslash character. But with this, you only need r’\’.

Regular characters match themselves.

Rules for a Match in regex

So, how does this work? The following rules must be met:

The search scans the string start to end.
The whole pattern must match, but not necessarily the whole string.
The search stops at the first match.

If a match is found, the group() method returns the matching phrase. If not, it returns None.

>>> print(re.search('na','no'))

Output

None

Let’s look at about a couple important functions now.

Python Regular Expression Functions

We have a few functions to help us use Python regex.

1. match() in python

match() takes two arguments- a pattern and a string. If they match, it returns the string. Else, it returns None.

Let’s take a few Python regular expression match examples.

>>> print(re.match('center','centre'))

Output

None

>>> print(re.match('...\w\we','centre'))

Output

<_sre.SRE_Match object; span=(0, 6), match=’centre’>

2. search() in python

search(), like match(), takes two arguments- the pattern and the string to be searched.

Let’s take a few examples.

>>> match=re.search('aa?yushi','ayushi')
>>> match.group()

Output

‘ayushi’

>>> match=re.search('aa?yushi?','ayush ayushi')
>>> match.group()

Output

‘ayush’

>>> match=re.search('\w*end','Hey! What are your plans for the weekend?')
>>> match.group()

Output

‘weekend’

>>> match=re.search('^\w*end','Hey! What are your plans for the weekend?')
>>> match.group()

Output

Traceback (most recent call last):File “<pyshell#337>”, line 1, in <module>

match.group()

AttributeError: ‘NoneType’ object has no attribute ‘group’

Here, an AttributeError raised because it found no match. This is because we specified that this pattern should be at the beginning of the string.

Let’s try searching for space.

>>> match=re.search('i\sS','Ayushi Sharma')
>>> match.group()

Output

‘i S’

>>> match=re.search('\w+c{2}\w*','Occam\'s Razor')
>>> match.group()

Output

‘Occam’

It really will take some practice to get it into habit what the metacharacters mean.

But since we don’t have so many, this will hardly take an hour.

Python Regex Examples

Let’s try crafting a Python regex for an email address. Hmm, so what does one look like? It looks like this: [email protected]

Let’s try the following code:

>>> match=re.search(r'[\w.-]+@[\w-]+\.[\w]+','Please mail it to [email protected]')
>>> match.group()

Output

‘[email protected]’It worked perfectly!

Here, if you would have typed [\w-.] instead of [\w.-], it would have raised the following error:

>>> match=re.search(r'[\w-.]+@[\w-]+\.[\w]+','Please mail it to [email protected]')

Output

Traceback (most recent call last):File “<pyshell#347>”, line 1, in <module>

match=re.search(r'[\w-.]+@[\w-]+\.[\w]+’,’Please mail it to [email protected]’)

File “C:\Users\lifei\AppData\Local\Programs\Python\Python36-32\lib\re.py”, line 182, in search
return _compile(pattern, flags).search(string)

File “C:\Users\lifei\AppData\Local\Programs\Python\Python36-32\lib\re.py”, line 301, in _compile

p = sre_compile.compile(pattern, flags)

File “C:\Users\lifei\AppData\Local\Programs\Python\Python36-32\lib\sre_compile.py”, line 562, in compile

p = sre_parse.parse(p, flags)

File “C:\Users\lifei\AppData\Local\Programs\Python\Python36-32\lib\sre_parse.py”, line 856, in parse

p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, False)

File “C:\Users\lifei\AppData\Local\Programs\Python\Python36-32\lib\sre_parse.py”, line 415, in _parse_sub

itemsappend(_parse(source, state, verbose))

File “C:\Users\lifei\AppData\Local\Programs\Python\Python36-32\lib\sre_parse.py”, line 547, in _parse

raise source.error(msg, len(this) + 1 + len(that))

sre_constants.error: bad character range \w-. at position 1

This is because normally, we use a dash (-) to indicate a range.

Group Extraction in regex

Let’s continue with the example on emails. What if you only want the username?

For this, you can provide an argument(like an index) to the group() method.

Take a look at this:

>>> match=re.search(r'([\w.-]+)@([\w-]+)\.([\w]+)','Please mail it to [email protected]')
>>> match.group()

Output

‘[email protected]’

>>> match.group(1)

Output

‘ayushiwasthere’

>>> match.group(2)

Output

‘gmail’

>>> match.group(3)

Output

‘com’

Parentheses let you extract the parts you want. Note that for this, we divided the pattern into groups using parentheses:

r'([\w.-]+)@([\w-]+)\.([\w]+)’

Python findall()

Above, we saw that Python regex search() stops at the first match.

But Python findall() returns a list of all matches found.

>>> match=re.findall(r'advi[cs]e','I could advise you on your poem, but you would disparage my advice')

We can then iterate on it.

>>> for i in match:
     print(i)

Output

advise
advice

>>> type(match)

Output

findall() with Files in Python

We have worked with files, and we know how to read and write them. Why not make life easier by using Python findall() with files?

We’ll first use the os module to get to the desktop. Let’s see.

>>> import os
>>> os.chdir('C:\\Users\\lifei\\Desktop')
>>> f=open('Today.txt')

We have a file called Today.txt on our Desktop. These are its contents:

OS, DBMS, DS, ADA

HTML, CSS, jQuery, JavaScript

Python, C++, Java

This sem’s subjects

Now, let’s call findall().

>>> match=re.findall(r'Java[\w]*',f.read())

Finally, let’s iterate on it.

>>> for i in match:
      print(i)

Output

JavaScript
Java

findall() with Groups in Python

We saw how we can divide a pattern into groups using parentheses. Watch what happens when we call Python Regex findall().

>>> match=re.findall(r'([\w]+)\s([\w]+)','Ayushi Sharma, Fluffy Sharma, Leo Sharma, Candy Sharma')
>>> for i in match:
   print(i)

Output

(‘Ayushi’, ‘Sharma’)

(‘Fluffy’, ‘Sharma’)

(‘Leo’, ‘Sharma’)

(‘Candy’, ‘Sharma’)

Python Regex Options

The functions we discussed may also take an optional argument. These options are:

1. Python Regular Expression IGNORECASE

This Python regex ignores case while matching.

Take this example of Python Regex IGNORECASE:

>>> match=re.findall(r'hi','Hi, did you ship it, Hillary?',re.IGNORECASE)
>>> for i in match:
      print(i)

Output

Hihi

2. Python MULTILINE

When working with a multi-line string, ^ and $ match the start and end of each line, not just the whole string.

>>> match=re.findall(r'^Hi','Hi, did you ship it, Hillary?\nNo, I didn\'t, but Hi',re.MULTILINE)
>>> for i in match:
      print(i)

Output

3. Python DOTALL

.* does not scan everything in a multiline string; it only matches the first line. This is because. does not match a newline.

To allow this, we use DOTALL.

>>> match=re.findall(r'.*','Hi, did you ship it, Hillary?\nNo, I didn\'t, but Hi',re.DOTALL)
>>> for i in match:
     print(i)

Output

Hi, did you ship it, Hillary?No, I didn’t, but Hi

Greedy vs Non-Greedy

Advantages and disadvantages of greedy and non- greedy:

1. Greedy:

Advantage: It works fast for simple patterns and captures as much text as possible.
Disadvantage: It might capture more text than required.

2. Non-greedy:

Advantage: It stops as soon as it finds its first match.
Disadvantage: As it checks more possibilities, it becomes slower.

The metacharacters *, +, and ? are greedy. This means that they keep searching. Let’s take an example.

>>> match=re.findall(r'(<.*>)','<em>Strong</em> <i>Italic</i>')
>>> for i in match:
     print(i)

Output

This gave us the whole string, because it greedily keeps searching. What if we just want the opening and closing tags? Look:

print(i)

>>> match=re.findall(r'(<.*?>)','<em>Strong</em> <i>Italic</i>')
>>> for i in match:
       print(i)

Output

The .* is greedy, and the ? makes it non-greedy.

Alternatively, we could also do this:

>>> match=re.findall(r'</?\w+>','<em>Strong</em> <i>Italic</i>')
>>> for i in match:
     print(i)

Output

Here’s another example:

>>> match=re.findall('(a*?)b','aaabbc')
>>> for i in match:
     print(i)

Output

aaa

Here, the ? makes * non-greedy. Also, if we had skipped the b after the ?, it would have returned an empty string.

The ? here needs a character after it to stop at. This works for all three- *?, +?, and ??.

Similarly, {m,n}? makes it non-greedy, and matches as few occurrences as possible.

Substitution

We can use the sub() function to substitute the part of a string with another. sub() takes three arguments- pattern, substring, and string.

>>> re.sub('^a','an','a apple')

Output

‘an apple’

Here, we used ^ so it won’t change apple to anpple. The grammar police approve.

Python Regex Applications

So, we learned so much about Python regular expressions, but where do we use them? They find use in these places:

Search engines
Find and Replace dialogues of word processors and text editors
Text processing utilities like sed and AWK
Lexical analysis

This was all about the Python Regex Tutorial

Python Interview Questions on Regular Expressions

1. What is a regular expression in Python? Explain with an example.

2. How to use regular expressions in Python?

3. What is the meaning of the question mark in a regular expression in Python?

4. How to split a regular expression in Python?

5. How to check if a regular expression is in Python?

Conclusion

These were the basics of Python regular expressions. Honestly, we think it is really cool to have such a tool in hand.

If you love English, try experimenting and make a small project with it.

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google

Tags: Python 3 regex Python findall python multiline Python Regex Python Regex example Python Regex Tutorial Python Regular Expressions regex in python rules for match in regex

DataFlair Team

DataFlair Team creates expert-level guides on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our goal is to empower learners with easy-to-understand content. Explore our resources for career growth and practical learning.

Aman says:
February 12, 2020 at 3:08 pm
In Greedy and Non Greedy part, please check your examples. While explaining the foundation of Greedy concept, your first example does not align to the concept.
Reply
- DataFlair Team says:
  August 29, 2023 at 3:03 pm
  We really appreciate your observation, we have noted your opinion and we will be making the necessary changes shortly. Thanks a lot aman for the feedback.
  Reply
ubant says:
February 7, 2021 at 8:15 pm
Great, I finally understand it!
Reply

Python Regex – Learn Python Regular Expression Functions

What is the Python Regular Expression (Regex)?

Python Regex – Metacharacters

Rules for a Match in regex

Python Regular Expression Functions

1. match() in python

2. search() in python

Python Regex Examples

Group Extraction in regex

Python findall()

findall() with Files in Python

findall() with Groups in Python

Python Regex Options

1. Python Regular Expression IGNORECASE

2. Python MULTILINE

3. Python DOTALL

Greedy vs Non-Greedy

Substitution

Python Regex Applications

Python Interview Questions on Regular Expressions

Conclusion

3 Responses

Leave a Reply Cancel reply

About DataFlair

Trending Courses

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Data Science Tutorials

Trending Projects

Trending Programming Tutorials

Trending Tutorials