How to Use Regular Expressions in Python
Jobsity
Posted on February 25, 2022
In Python, Regular Expressions are also known as RegEx. RegEx is a unique text string used for defining a search pattern. Regular Expression is beneficial for extracting information from text such as code, files, logs, spreadsheets, or even documents. In simple words, RegEx is a combination of letters, symbols, and numbers you can use to search for things within a longer text.
This article sheds light on the usage of Regular Expression using Python.
While you are using the Regular Expression in Python, the first and foremost thing is to understand that everything is a character, and you are reproducing patterns to match a specific sequence of characters also regarded as strings.
In the programming world, both newbie and experienced developers often ask how vital learning RegEx is. Today our quest is to find out the importance of RegEx in the context of development with some simple and relevant examples.
Let me introduce some fundamental metacharacters in RegEx and their purpose of use. To know more about RegEx syntax, you may check the Official Documentation.
Character(s) | What it does in Python |
. - A period. | Matches up any single character except the newline character. |
^ - A caret. | Matches up a pattern at the start of the string. |
\A - Uppercase A. | It matches up only at the start of the string. |
$ - Dollar sign. | Matches the end of the string. |
\Z - Uppercase Z. | It matches up only at the end of the string. |
[] | It matches the set of characters you specify within it. |
\ - Backslash. | Used to drop the special meaning of character following it |
\w - Lowercase w. | It matches any single letter, digit, or underscore. |
\W - Uppercase W. | Matches any character not part of \w. |
\s - Lowercase s. | Matches up a single whitespace character like space, newline, tab, return. |
\S - Uppercase S. | It matches up any character that is not a part of \s. |
\d - Lowercase d. | Matches decimal digit 0-9. |
\D - Uppercase D. | It matches any character that is not a decimal digit. |
\t - Lowercase t. | Matches tab. |
\n - Lowercase n. | Matches newline. |
\r - Lowercase r. | Matches return. |
\b - Lowercase b. | It matches up only the beginning or end of the word. |
+ | Checks if the preceding character appears single or multiple times. |
* | Scrutinizes if the preceding character appears zero or more times. |
? | Matches zero or one occurrence. |
{} | Checks for an explicit number of times. |
() | Creates a group when performing matches. |
<> | Creates a named group when performing matches. |
Note: The above table doesn't shed light on every aspect of RegEx. Don't bother if you can't wrap your head around all the meta-characters for now. With time and training, you will understand the uniqueness of these characters and learn when to use them.
Python comes with a module named re to work with RegEx. The basic syntax of importing the RegEx module is:
import re
The RegEx mechanism comes with the module named re
, and it provides many functions. In this tutorial, I’m going to discuss the three most practical and widely used functions.
re.match() Function
The re.match() function of the re module in Python searches for the regular expression pattern and returns the first occurrence. The Python RegEx Match function checks for a match only at the beginning of the string. Therefore, if a match is found in the first line, it immediately returns the matched object. However, if a match is seen in another line, the Python RegEx Match function returns null. Please allow me to show you a quick example now.
import re
pattern = "C"
sequence1 = "IceCream"
sequence2 = "Catch"
print("Sequence 1: ", re.match(pattern, sequence1))
print("Sequence 2: ", re.match(pattern,sequence2).group())
Output:
Sequence 1: None
Sequence 2: C
In the above example, the function re.match()
returns a corresponding match object if zero or more characters (C) at the beginning of the string match the pattern. Else it returns None if the string does not match the given pattern. As you can see, we get an output of None
because it didn't find the pattern at the beginning of the first occurrence.
re.search() Function
The re.search() function searches for the regular expression pattern and returns the first occurrence in a string. Unlike the re.match() function, it checks all lines of the input string. The Python re.search() function returns a match object when the pattern is found and is "null” if the pattern is not found in the string.
Here is a simple example for your better understanding.
import re
patterns = ["Software testing", "Dan"]
my_string = "Software testing is a tough job"
for pattern in patterns:
print("Looking for '%s' in '%s' " % (pattern, my_string), end='')
if re.search(pattern,my_string):
print("Here is a match case")
else:
print("Not a match")
Output:
Looking for 'Software testing' in 'Software testing is a tough job' Here is a match case
Looking for 'Dan' in 'Software testing is a tough job' Not a match
I set two patterns in the above python program and searched them into my_string
. For the first case, the function got a matching case. Therefore, it returned positive and negative for the second case.
re.findall() Function
In Python, the re.findall() method returns a list of strings containing all the matching cases from the input text upon provided pattern. The re.search() method quickly searches the provided text using the specified pattern and returns only the first occurrence. In contrast, the re.findall() function iterates over all the lines of the file and returns all the non-overlapping matches of pattern in a single step. If the pattern is not found in the text, re.findall() returns an empty list.
Let’s write a simple python program to understand the **re.findall() **function’s behavior.
import re
my_string1 = "Hello 10 Dan 20. Howdy? 30"
my_string2 = "Hello Dan Howdy?"
pattern = '\d+'
result1 = re.findall(pattern, my_string1)
result2 = re.findall(pattern, my_string2)
print(result1)
print(result2)
Output:
['10', '20', '30']
[]
If you look closely into the above example, you'll find two different strings declared against a single pattern. For the first string, the re.findall() function got a match and returned the value ['10', '20', '30']
as per the pattern demand. On the other hand, the second string did not have the pattern available in it. Therefore, the function returned an empty list []
.
Regular Expression is a vast topic, and it requires a lot of practice to become a master of it. In this tutorial, I tried to give you a quick review of RegEx and its essential functions. However, it has many advanced-level functions that you will know later on. Being proficient in RegEx and its functions, solving exercise is a must!
Feel free to browse through the other sections of the blog where you can find many other amazing articles on: Programming, IT, Outsourcing, and even Management.
Interested in joining our amazing team and working with top U.S. based clients? Then apply here and take your career to the next level.
Posted on February 25, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.