Python regular expression quick learning

Regular expression, referred to as regex or re, stands for advanced text pattern matching. It is an important method of text processing and is commonly used for string retrieval and replacement. It was first used in unix text editors, and now almost all high-level programming languages support regular expressions.

In python, you can use the built-in module re to use regular expressions.

Common symbols of regular expressions

Matches a single character

Match multiple characters

Other matches

Re module use

Python's re module provides many matching methods, which can implement regular extraction of strings according to different scenarios.

Function Description Return Value
match(pattern, string, flags=0) Use a regular expression pattern with optional flags to match the string If the match succeeds, it returns the matching object; if it fails, it returns None
search(pattern, string, flags=0) Use optional tags to search for the first occurrence of the regular expression pattern in the string If the match succeeds, return the matched object; if it fails, return None
findall(pattern, string[, flags]) Find all (non-repeated) regular expression patterns in a string match list
finditer(pattern, string[, flags]) Same as findall, but instead of returning a list an iterator
split(pattern, string,max=0 ) According to the pattern separator of the regular expression, the split function splits the string into a list, and then returns a list of successful matches. The split is operated up to max times, and all matching positions are split by default The split list
sub(pattern, repl, string, count=0) Use repl to replace count times where the regular expression pattern appears in the string; replace all by default number of replacement operations
purge() Clear implicitly compiled regular expression patterns; clear cache

1. match starts from the first character of the string and returns None if it does not match, and returns an object if it matches

a ='A83C72D1D8E67'
r = re.match('A83',a)print(r) #Returns the location of the object
print(r.group()) #Use group method to extract data
print(r.span())#Return a tuple representing the matching position (start, end)
Output
< re.Match object; span=(0,3), match='A83'>A83(0,3)

**2. re.compile converts regular expressions into pattern objects to improve matching efficiency. After using the compile conversion once, there is no need to convert each time the mode is used later. **

compile(pattern, flags=0)
pattern: write regular expressions
flags: matching mode

It can be seen that what is returned is a matching object. It has no meaning to use it alone. It needs to be used in conjunction with findall(), search(), and match().

res = re.compile('\w+')
res2 = res.search('*##abcd123_ABC####123').group()#Search is similar to match, except that the entire string is searched and the value is returned if the specified character is matched first, and None is returned if it does not match.
print(res2)
Output
abcd123_ABC

**3. findall is to match all the values related to the specified value in the string and return it in the form of a list. If it does not match, it returns an empty list. **

res = re.findall('ab+','abcdabddac')print(res)
Output
[' ab','ab']

**4. re.split(pattern, string[, maxsplit=0, flags=0]): Cut the string according to the match and return a list. **

res = re.split('\W','123#abc#')print(res)
Output
['123',' abc','']

5. String replacement re.sub

import re
a ='abcABC'
r = re.sub('abc','ABC',a)print(r)

Greed and non-greedy

Quantifiers in Python are greedy by default and always try to match as many characters as possible;

The non-greedy operator "?" is used after "*", "+", "?". The less regular matching is required, the better.

res = re.findall('(p.+)','pythonpythonpython')print(res)
Output
[' pythonpythonpython']

res = re.findall('(p.+?)','pythonpythonpython')print(res)
Output
[' py','py','py']

**What does it mean to always write an r for regular matching? **

In Python, r is added to the front of the string, r represents the abbreviation of raw and raw string means native characters, which means that special characters in the middle of the string do not need to be escaped. For example, if you want to express'\n', you can do this:

r'\n'

Everyone needs to pay attention that regular expressions do not need to be memorized deliberately. Commonly used regular expressions can be searched on the Internet. Of course, it is generally preferred to use built-in methods to achieve matching, and then consider regular expressions.

Recommended Posts

Python regular expression quick learning
python_ regular expression learning
Python regular expression example code
python learning route
A quick introduction to Python regular expressions
python list learning
Python entry learning materials
Python3 entry learning four.md
Python function basic learning
python_ crawler basic learning
Python3 entry learning three.md
Python3 entry learning one.md
Python3 entry learning two.md
Python programming Pycharm fast learning
Getting started python learning steps
Python magic function eval () learning
Quick start Python file operation
Learning Python third day one-line function
Learning path of python crawler development
Python learning os module and usage