» Quick Introduction to Python » 3. Advanced » 3.4 Regular Expressions

Regular Expressions

Regular expressions (called REs, or regexes, or regex patterns) are essentially a tiny, highly specialized programming language embedded inside Python and made available through the re module.

Using this little language, you specify the rules for the set of possible strings that you want to match; this set might contain English sentences, or e-mail addresses, or TeX commands, or anything you like. You can then ask questions such as “Does this string match the pattern?”, or “Is there a match for the pattern anywhere in this string?”. You can also use REs to modify a string or to split it apart in various ways.

Find All Matches

import re

pattern = "ab"
content = "abcabcdbab"
results = re.findall(pattern, content)
print(results) # ['ab', 'ab', 'ab']

pattern = "[0-9_]+" # digit or underscore which occurs multiple times
content = "56abc789h__31"
results = re.findall(pattern, content)
print(results) # ['56', '789', '__31']

# ignore case
pattern = "ab"
content = "AbcabcdbaBB"
results = re.findall(pattern, content, re.IGNORECASE)
print(results) # ['Ab', 'ab', 'aB']

Get First Match

import re

pattern = "[0-9_]+"
content = "56abc789h__31"
result = re.search(pattern, content)
print(result) # <re.Match object; span=(0, 2), match='56'>
print(result.group()) # 56

Split Strings by Pattern

import re

pattern = "[0-9_]+"
content = "56abc789h__31hello"
segments = re.split(pattern, content)
print(segments) # ['', 'abc', 'h', 'hello']

Substitute Substrings by Pattern

import re

pattern = "[0-9_]+"
content = "56abc789h__31hello"
result = re.sub(pattern, '***', content)
print(result) # ***abc***h***hello

Code Challenge

Try to modify the regex pattern in the editor to extract all phone numbers.

Loading...
> code result goes here
Prev
Next