python_ regular expression learning

re.match() function:

Function syntax: re.mathch( pattern , string , flags =0)

Parameter Description:

pattem matched regular expression
string The string to match
flags Flags, used to control regular matching methods; such as case sensitivity, multi-line matching, etc.

If the match is successful, the re.match method returns a matched object, otherwise it returns None

You can use the group(num) or groups() matching object function to get the matching expression

group(num) The string matching the entire expression, you can enter multiple group numbers at once, in this case a tuple containing the values corresponding to those groups will be returned
group( ) Returns a tuple containing those group strings, from 1 to the group number contained

Examples:

>>> import re

>>> print(re.match('www','www.google.com').span())  #Match at the start(0,3)>>>print(re.match('com','www.google.com'))#不Match at the start

None

Examples:

>>> import re

>>> line ="Cats are smarter than dogs">>> # .*Indicates any match except for newlines (\n \r) any single or multiple characters other than

>>> match0bj = re.match(r'(.*)are(.*?).*',line,re.M|re.I)>>>if match0bj:print('match0bj.group():',match0bj.group())print('match0bj.group(1):',match0bj.group(1))print('match0bj.group(2):',match0bj.group(2))>>>else:print('No match!!!')

 

match0bj.group(): Cats are smarter than dogs

match0bj.group(1): Cats

match0bj.group(2):

re.search() function: scan the entire string and return the first successful match

Function syntax: re.search( pattern , string , flags=0)

Parameter Description:

pattem matched regular expression
string The string to match
flags Flags, used to control regular matching methods; such as case sensitivity, multi-line matching, etc.

If the match is successful, the re.match method returns a matched object, otherwise it returns None

You can use the group(num) or groups() matching object function to get the matching expression

group(num) The string matching the entire expression, you can enter multiple group numbers at once, in this case a tuple containing the values corresponding to those groups will be returned
group( ) Returns a tuple containing those group strings, from 1 to the group number contained

Examples:

>>> import re

>>> print(re.search('www','www.google.com').span())#Match at the start(0,3)>>>print(re.search('com','www.google.com'),span())#不Match at the start(11,14)

The difference between re.match and re.search:

re.match only matches the beginning of the string. If the string does not match the regular expression at the beginning, the match fails and the function returns None; while re.search matches the entire string until a match is found.

>>> import re

>>> line ='Cats are smarter than dogs'>>> match0bj = re.match( r'dogs',line,re.M|re.I)>>>if match0bj:print("match --> match0bj.group():",match0bj.group())else:print("No match!!!")

 

No match!!!>>> match0bj = re.search(r'dogs',line,re.M|re.I)>>>if match0bj:print("match --> match0bj.group():",match0bj.group())else:print("No match!!!")

 

match --> match0bj.group(): dogs

re.sub() function: (retrieve and replace) used to replace matches in a string

Syntax: re.sub( pattern , rep1 , string , coun=0)

parameter:

pattern Pattern string in regular
repl The replaced string can also be a function
string The original string to be searched and replaced
count The maximum number of replacements after pattern matching, the default is 0 means to replace all matches

Examples:

>>> import re

>>> phone ='2004-959-559  #This is a number'>>> #Delete comment

>>> num = re.sub(r'#.*$',"",phone)>>>print("telephone number:",num)

Phone number: 2004-959-559>>> #Remove non-digital content

>>> num = re.sub(r'-',"",phone)>>>print("telephone number:",num)

Phone number: 2004959559#This is a number

 

The repl parameter is a function:

>>> import re

>>> # Multiply the matched number by 2>>> def double(matched):

value =int(matched.group('value'))returnstr(value *2)>>> s ='A23G4HFD567'>>>print(re.sub('(?P<value>\d+)',double,s))

A46G8HFD1134

re.compile() function:

Used to compile regular expressions and generate a regular expression (Pattern) object for use by the two functions match() and search()!

Syntax format: re.compile( pattern [, flags ])

parameter:

pattem A regular expression in string form
flags (optional) indicates the matching mode, such as ignore case, multi-line mode and other specific parameters: re.I ignore case re.L indicates the special character set \w,\W,\b,\B,\s, \S depends on the current environment re.M multi-line mode re.S means'. 'And any character including the newline character ('.' Does not include the newline character) re.U represents the special character set \w,\W,\ b,\B,\s,\S depend on the Unicode character attribute database re.X In order to increase readability, spaces and comments after the'#' are ignored
re.I Ignore case
re.L means that the special character set \w,\W,\b,\B,\s,\S depends on the current environment
re.M Multi-line mode
re.S means'. 'and any character including the newline character ('.' does not include the newline character)
re.U means that the special character set \w, \W, \b, \B, \s, \S depends on the Unicode character attribute database
re.X In order to increase readability, ignore spaces and comments after'#'

Examples:

>>> import re

>>> pattern = re.compile(r'\d+')>>> m = pattern.match('one12twothree34four') #Find the head, no match

>>> print(m)

None

>>> m = pattern.match('one12twothree34four',2,10)  #Match from the position of e, no match

>>> print(m)

None

>>> m = pattern.match('one12twothree34four',3,10)#Match from position 1, exactly

>>> print(m)#Return a Match object

<_ sre.SRE_Match object; span=(3,5), match='12'>>>> m.group(0)#Can be omitted 0'12'>>> m.start(0)#Can be omitted 03>>> m.end(0)#Can be omitted 05>>> m.span(0)#Can be omitted 0(3,5)

In the example, when the match is successful, a Match object is returned, where:

group([group1,...]) Used to get one or more grouped matching strings, when you want to get the entire matched substring, you can directly use group() or group(0)
start([group]) Used to get the starting position of the matched substring in the entire string (the index of the first character of the substring). The default is 0
end([group]) Get the end position of the substring matched by the group in the entire string (the index of the last character of the substring +1) by default 0
span([group]) return(start(group), end(group))

Example + continued

>>> import re

>>> pattern = re.compile(r'([a-z]+) ([a-z]+)',re.I)#re.I means ignore case

>>> m = pattern.match('hello world wide web')>>>print(m)#Match is successful, return a Match object

<_ sre.SRE_Match object; span=(0,11), match='hello world'>>>> m.group(0)#Return the entire string that matches successfully

' hello world'>>> m.span()#Returns the index of the entire substring that matches successfully(0,11)>>> m.group(1)#Return the first substring that matches successfully

' hello'>>> m.span(1)#Returns the index of the first substring matched successfully(0,5)>>> m.group(2)#Return the second substring that matches successfully

' world'>>> m.span(2)#Returns the index of the second substring matched successfully(6,11)>>> m.groups()#Equivalent to(m.group(1),m.group(2),...)('hello','world')>>> m.group(3)#There is no third group-error is reported

Traceback(most recent call last):

 File "<pyshell#12>", line 1,in<module>

 m.group(3)

IndexError: no such group

findall() function:

Find all substrings matched by the regular expression in the string, and return a list, if no match is found, return an empty list.

Note: match and search match once/findall match all.

Syntax format: findall( string[, pos[, endpos]])

parameter:

string The string to be matched
pos Optional parameter, specify the starting position of the string (default 0)
endpos optional parameter, specify the end position of the string, (default total length of the string)

Examples:

>>> import re

>>> pattern = re.compile(r'\d+') #Find the number

>>> result1 = pattern.findall('runoob 123 google 456')>>> result2 = pattern.findall('run88oob123google456',0,10)>>>print(result1)['123','456']>>>print(result2)['88','12']

re.finditer() function:

-Similar to findall, find all substrings matched by the regular expression in the string and return them as an iterator.

Syntax format: re.finditer( pattern, string, flags=0)

parameter:

pattern matched regular expression
string The string to match
flags flags

Examples:

>>> import re

>>> it = re.finditer(r'\d+','12a32bc43jf3')>>>for match in it:print(match.group())1232433

re.split() function:

The split method splits the string according to the substring that can be matched and returns the list list. The syntax is as follows:

  re.split( pattern, string[,maxsplit=0, flags=0])

parameter:

pattern matched regular expression
string The string to match
maxsplit Split times, maxsplit=1>>Split once, the default is 0 Unlimited times
flags flags

Examples:

>>> import re

>>> re.split('\W+','runoob, runoob, runoob.')['runoob','runoob','runoob','']>>> re.split('(\W+)',' runoob, runoob, runoob.')['',' ','runoob',', ','runoob',', ','runoob','.','']>>> re.split('\W+',' runoob, runoob, runoob.',1)['','runoob, runoob, runoob.']>>> re.split('a*','hello world') #For a string that cannot be matched, split will not split it

[' hello world']

Regular expression object:

· Re.compile() returns RegexObject object

·re.MatchObject

group() returns the string matched by RE

——Start(): Returns the position where the match starts

—— end(): Returns the position where the match ends

——Span(): Return a tuple containing the position of the match (start, end)

Regular expression modifiers-optional flags:

Regular expressions can contain some optional flag modifiers to control the matching pattern; the modifier is designated as an optional flag; multiple flags can be specified by bitwise OR (I) them. (For example, re.I | re.M is set to I and M signs):

Modifier Function
re.I Make matching case insensitive (ignore case)
re.L Do locale-aware matching
re.M Multi-line matching, affects ^ and $
re.S Make. match all characters including newline
re.U Analyze characters according to the Unicode character set, this flag affects \w,\W,\b,\B
re.X This flag gives you a more flexible format so that you can write regular expressions easier to understand

Regular expression pattern:

·The pattern string uses a special syntax to represent a regular expression;

· Letters and numbers represent themselves; letters and numbers in a regular expression pattern match the same string;

·Most letters and numbers have different meanings when you add a backslash before them;

·Punctuation marks match themselves only when they are escaped, otherwise they have a special meaning;

·The backslash itself needs to be escaped with a backslash;

· Since regular expressions usually contain backslashes, you'd better use raw strings to represent them.

·Pattern element: r'\t' is equivalent to \t to match the corresponding special characters;

·The following table lists the special elements in the re-expression pattern syntax. If you provide optional flag parameters while using a pattern, the meaning of some pattern elements will change:

Mode Function
^ Match the beginning of the string
$ matches any character, except for the newline character, when the DOTALL tag is specified, it can match any character including the newline character
[...] Used to represent a group of characters, listed separately: [amk] matches'a','m' or'k'
[^...] Characters not in []: [^abc] matches characters other than a, b, c
re* matches zero or more expressions
re+ matches one or more expressions
re? match 0 or 1 fragment defined by the previous regular expression, non-greedy way
re{ n} match n previous expressions (for example: "o{2}" cannot match the "o" in "Bob", but it can match the "o" in "food")
re{ n,} matches exactly n previous expressions. For example, "o{2,}" cannot match the "o" in "Bob", but it can match all o in "foooood". "o{1,}" is equivalent to "o+". "o{0,}" is equivalent to "o*".
re{n,m} match the fragment defined by the previous regular expression n to m times, greedy way
a b
( re) Matches the expression in parentheses, which also means a group
(? imx) Regular expression contains three optional flags: i, m or x-only affect the area in brackets
(?- imx) Regular expressions turn off i, m, or x optional flags
(?: re) Similar to (...), but does not indicate a group
(? imx:re) Use i, m, x optional flags in brackets
(?- imx:re) Do not use i, m, x optional flags in brackets
(?#...) Notes
(?= re) Forward positive delimiter. If the regular expression contained is represented by..., it succeeds when the current position is successfully matched, otherwise it fails. But once the contained expression has been tried, the matching engine has not improved at all; the rest of the pattern has to try the right side of the delimiter
(?! re) forward negation delimiter. Contrary to the positive delimiter; it succeeds when the contained expression cannot match at the current position of the string.
(?> re) The independent pattern of matching, eliminating the need for backtracking.
\ w Match numbers and letters underscore
\ W Matches non-numeric letters underscore
\ s matches any blank character (equivalent to \t \n \r \f)
\ S matches any non-empty character
\ d match any number (equivalent to [0-9])
\ D match any non-digit
\ A Start of matching string
\ Z The end of the matching string (if there is a newline, only the ending character before the newline is matched)
\ z End of matching string
\ G Match the position where the last match was completed
\ b matches a word boundary, that is, the position between the word and the space (for example:'er\b' can match the'er' in "never" but not the'er' in "verb"
\ B Match non-word boundaries (for example:'er\B' can match the'er' in "verb", but cannot match the'er' in "never")
\ n \t etc matches a newline character, matches a tab character, etc.
\1...\9 Match the content of the nth group
\10 Match the content of the nth group, if it is matched; otherwise, it refers to the expression of the octal character code

Examples of regular expressions:

Character matching:

[ Pp]ython matches "Python" or "python"
rub[ye] matches "ruby" or "rube"
[ aeiou] Match any letter in the brackets
[0- 9] Matches any number
[ az] match any lowercase letter
[ AZ] matches any uppercase letter
[ a-zA-Z0-9] match any number and letter
[^ aeiou] All characters except aeiou letters
[^0- 9] Matches characters other than digits

Special character class:

. Match any single character except "\n", use "[.\n]" pattern within "\n"
\ d match a digit character
\ D match a non-digit character
\ s matches any whitespace character
\ S matches any non-whitespace character
\ w matches any word character that contains an underscore
\ W matches any non-word character

Recommended Posts

python_ regular expression learning
Python regular expression quick learning
Python regular expression learning small example
Python regular expression example code
Python entry learning materials
Python3 entry learning four.md
Python function basic learning
python_ crawler basic learning
Python3 entry learning three.md
Python3 entry learning one.md
Python3 entry learning two.md
Getting started python learning steps
Python magic function eval () learning
Learning path of python crawler development
Python learning os module and usage
A quick introduction to Python regular expressions
Two days of learning the basics of Python
A first look at Python regular expressions (6)