Regular expression, referred to as regex or re, stands for advanced text pattern matching. It is an important method of text processing and is commonly used for string retrieval and replacement. It was first used in unix text editors, and now almost all high-level programming languages support regular expressions.
In python, you can use the built-in module re to use regular expressions.
Matches a single character
Match multiple characters
Other matches
Python's re module provides many matching methods, which can implement regular extraction of strings according to different scenarios.
Function | Description | Return Value |
---|---|---|
match(pattern, string, flags=0) | Use a regular expression pattern with optional flags to match the string | If the match succeeds, it returns the matching object; if it fails, it returns None |
search(pattern, string, flags=0) | Use optional tags to search for the first occurrence of the regular expression pattern in the string | If the match succeeds, return the matched object; if it fails, return None |
findall(pattern, string[, flags]) | Find all (non-repeated) regular expression patterns in a string | match list |
finditer(pattern, string[, flags]) | Same as findall, but instead of returning a list | an iterator |
split(pattern, string,max=0 ) | According to the pattern separator of the regular expression, the split function splits the string into a list, and then returns a list of successful matches. The split is operated up to max times, and all matching positions are split by default | The split list |
sub(pattern, repl, string, count=0) | Use repl to replace count times where the regular expression pattern appears in the string; replace all by default | number of replacement operations |
purge() | Clear implicitly compiled regular expression patterns; clear cache |
1. match starts from the first character of the string and returns None if it does not match, and returns an object if it matches
a ='A83C72D1D8E67'
r = re.match('A83',a)print(r) #Returns the location of the object
print(r.group()) #Use group method to extract data
print(r.span())#Return a tuple representing the matching position (start, end)
Output
< re.Match object; span=(0,3), match='A83'>A83(0,3)
**2. re.compile converts regular expressions into pattern objects to improve matching efficiency. After using the compile conversion once, there is no need to convert each time the mode is used later. **
compile(pattern, flags=0)
pattern: write regular expressions
flags: matching mode
It can be seen that what is returned is a matching object. It has no meaning to use it alone. It needs to be used in conjunction with findall(), search(), and match().
res = re.compile('\w+')
res2 = res.search('*##abcd123_ABC####123').group()#Search is similar to match, except that the entire string is searched and the value is returned if the specified character is matched first, and None is returned if it does not match.
print(res2)
Output
abcd123_ABC
**3. findall is to match all the values related to the specified value in the string and return it in the form of a list. If it does not match, it returns an empty list. **
res = re.findall('ab+','abcdabddac')print(res)
Output
[' ab','ab']
**4. re.split(pattern, string[, maxsplit=0, flags=0]): Cut the string according to the match and return a list. **
res = re.split('\W','123#abc#')print(res)
Output
['123',' abc','']
5. String replacement re.sub
import re
a ='abcABC'
r = re.sub('abc','ABC',a)print(r)
Quantifiers in Python are greedy by default and always try to match as many characters as possible;
The non-greedy operator "?" is used after "*", "+", "?". The less regular matching is required, the better.
res = re.findall('(p.+)','pythonpythonpython')print(res)
Output
[' pythonpythonpython']
res = re.findall('(p.+?)','pythonpythonpython')print(res)
Output
[' py','py','py']
In Python, r is added to the front of the string, r represents the abbreviation of raw and raw string means native characters, which means that special characters in the middle of the string do not need to be escaped. For example, if you want to express'\n', you can do this:
r'\n'
Everyone needs to pay attention that regular expressions do not need to be memorized deliberately. Commonly used regular expressions can be searched on the Internet. Of course, it is generally preferred to use built-in methods to achieve matching, and then consider regular expressions.
Recommended Posts