20
Working With Regex Expressions In Python
I write content for AWS, Kubernetes, Python, JavaScript and more. To view all the latest content, be sure to visit my blog and subscribe to my newsletter. Follow me on Twitter.
This is Day 14 of the #100DaysOfPython challenge
This post will use the regular expressions module from the standard library to ... .
- Familiarity with Pipenv. See here for my post on Pipenv.
- Familiarity with JupyterLab. See here for my post on JupyterLab.
- Familiarity with Regular Expressions
Let's create the hello-python-regex
directory and install Pillow.
# Make the `hello-python-regex` directory
$ mkdir hello-python-regex
$ cd hello-python-regex
# Create a folder to place your icons
$ mkdir icons
# Init the virtual environment
$ pipenv --three
$ pipenv install --dev jupyterlab
At this stage, we can start up the notebook server.
# Startup the notebook server
$ pipenv run jupyter-lab
# ... Server is now running on http://localhost:8888/lab
The server will now be up and running.
Once on http://localhost:8888/lab, select to create a new Python 3 notebook from the launcher.
Ensure that this notebook is saved in hello-python-regex/docs/regex.ipynb
.
We will explore the following in each cell of the notebook:
- Importing the Regex module.
- A basic usage of the Regex module.
- String replacement with the Regex module.
This imports the regex module from the standard library.
import re
m = re.search("Hello, (.+)", "Hello, world!")
m.group(1)
# 'world!'
There are a number of useful module methods that we can use that we will demonstrate:
- Searching strings.
- Matching strings.
- Usage without compile.
- Splitting a string into a list.
- Replacing matches.
Scan through string looking for the first location where this regular expression produces a match, and return a corresponding match object. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.
import re
pattern = re.compile("ello, (.+)")
m = pattern.search("Hello, world!")
m.group(1)
print(m) # <re.Match object; span=(0, 13), match='Hello, world!'>
print(m.group(1)) # world!
n = pattern.search("Hello, world!", 0)
print(n) # <re.Match object; span=(0, 13), match='Hello, world!'>
print(n.group(1)) # world!
If zero or more characters at the beginning of string match this regular expression, return a corresponding match object. Return None if the string does not match the pattern; note that this is different from a zero-length match.
pattern = re.compile("ello, (.+)")
m = pattern.match("Hello, world!")
# No match as "e" is the 2nd character the "Hello, world!".
print(m) # None
pattern = re.compile("Hello, (.+)")
# Does match
n = pattern.match("Hello, world!")
print(n) # <re.Match object; span=(0, 13), match='Hello, world!'>
When you use re.match
and re.search
as a static method, you can pass the Regex as the first argument:
m = re.match("Hello, (.+)", "Hello, world!")
print(m) # <re.Match object; span=(0, 13), match='Hello, world!'>
n = re.match("Hello, (.+)", "Hello, world!")
print(n) # <re.Match object; span=(0, 13), match='Hello, world!'>
m = re.split(",", "Hello, world!")
print(m) # ['Hello', ' world!']
n = re.split("\s", "Hello beautiful world!")
print(n) # ['Hello', 'beautiful', 'world!']
We can make use of the search
and sub
methods to replace matches.
# Simple example
target = "Photo credit by [@thomas](https://site.com/@thomas)"
m = re.search(r"Photo credit by \[@(.+)\]\(https://site.com/@(.+)\)", target)
res = re.sub(m.group(1), "dennis", target)
print(res) # Photo credit by [@dennis](https://site.com/@dennis)
# By iterating for multiple matches
target = "Photo credit by [@thomas](https://site.com/@user)"
m = re.search(r"Photo credit by \[@(.+)\]\(https://site.com/@(.+)\)", target)
res = target
for idx, val in enumerate(m.groups()):
res = re.sub(val, "dennis", res)
print(res) # Photo credit by [@dennis](https://site.com/@dennis)
For a more specific replacement (particularly in a large set of text), we can be more explicit with the string to replace:
target = """
Other words thomas and user we don't want to replace.
Photo credit by [@thomas](https://site.com/@user)
"""
new_name = "dennis"
pattern = re.compile(r"Photo credit by \[@(.+)\]\(https://site.com/@(.+)\)")
res = pattern.sub(f"Photo credit by [@{new_name}](https://site.com/@{new_name})", target)
# Other words thomas and user we don't want to replace.
# Photo credit by [@dennis](https://site.com/@dennis)
Today's post demonstrated how to use the re
module from the standard library to search, match, split and replace text in Python strings.
This can be unbelievably useful when working with text files.
Photo credit: pawel_czerwinski
Originally posted on my blog. To see new posts without delay, read the posts there and subscribe to my newsletter.
20