re.match() function:
Function syntax: re.mathch( pattern , string , flags =0)
Parameter Description:
pattem | matched regular expression |
---|---|
string | The string to match |
flags | Flags, used to control regular matching methods; such as case sensitivity, multi-line matching, etc. |
If the match is successful, the re.match method returns a matched object, otherwise it returns None
You can use the group(num) or groups() matching object function to get the matching expression
group(num) | The string matching the entire expression, you can enter multiple group numbers at once, in this case a tuple containing the values corresponding to those groups will be returned |
---|---|
group( ) | Returns a tuple containing those group strings, from 1 to the group number contained |
Examples:
>>> import re
>>> print(re.match('www','www.google.com').span()) #Match at the start(0,3)>>>print(re.match('com','www.google.com'))#不Match at the start
None
Examples:
>>> import re
>>> line ="Cats are smarter than dogs">>> # .*Indicates any match except for newlines (\n \r) any single or multiple characters other than
>>> match0bj = re.match(r'(.*)are(.*?).*',line,re.M|re.I)>>>if match0bj:print('match0bj.group():',match0bj.group())print('match0bj.group(1):',match0bj.group(1))print('match0bj.group(2):',match0bj.group(2))>>>else:print('No match!!!')
match0bj.group(): Cats are smarter than dogs
match0bj.group(1): Cats
match0bj.group(2):
re.search() function: scan the entire string and return the first successful match
Function syntax: re.search( pattern , string , flags=0)
Parameter Description:
pattem | matched regular expression |
---|---|
string | The string to match |
flags | Flags, used to control regular matching methods; such as case sensitivity, multi-line matching, etc. |
If the match is successful, the re.match method returns a matched object, otherwise it returns None
You can use the group(num) or groups() matching object function to get the matching expression
group(num) | The string matching the entire expression, you can enter multiple group numbers at once, in this case a tuple containing the values corresponding to those groups will be returned |
---|---|
group( ) | Returns a tuple containing those group strings, from 1 to the group number contained |
Examples:
>>> import re
>>> print(re.search('www','www.google.com').span())#Match at the start(0,3)>>>print(re.search('com','www.google.com'),span())#不Match at the start(11,14)
The difference between re.match and re.search:
re.match only matches the beginning of the string. If the string does not match the regular expression at the beginning, the match fails and the function returns None; while re.search matches the entire string until a match is found.
>>> import re
>>> line ='Cats are smarter than dogs'>>> match0bj = re.match( r'dogs',line,re.M|re.I)>>>if match0bj:print("match --> match0bj.group():",match0bj.group())else:print("No match!!!")
No match!!!>>> match0bj = re.search(r'dogs',line,re.M|re.I)>>>if match0bj:print("match --> match0bj.group():",match0bj.group())else:print("No match!!!")
match --> match0bj.group(): dogs
re.sub() function: (retrieve and replace) used to replace matches in a string
Syntax: re.sub( pattern , rep1 , string , coun=0)
parameter:
pattern | Pattern string in regular |
---|---|
repl | The replaced string can also be a function |
string | The original string to be searched and replaced |
count | The maximum number of replacements after pattern matching, the default is 0 means to replace all matches |
Examples:
>>> import re
>>> phone ='2004-959-559 #This is a number'>>> #Delete comment
>>> num = re.sub(r'#.*$',"",phone)>>>print("telephone number:",num)
Phone number: 2004-959-559>>> #Remove non-digital content
>>> num = re.sub(r'-',"",phone)>>>print("telephone number:",num)
Phone number: 2004959559#This is a number
The repl parameter is a function:
>>> import re
>>> # Multiply the matched number by 2>>> def double(matched):
value =int(matched.group('value'))returnstr(value *2)>>> s ='A23G4HFD567'>>>print(re.sub('(?P<value>\d+)',double,s))
A46G8HFD1134
re.compile() function:
Used to compile regular expressions and generate a regular expression (Pattern) object for use by the two functions match() and search()!
Syntax format: re.compile( pattern [, flags ])
parameter:
pattem | A regular expression in string form |
---|---|
flags | (optional) indicates the matching mode, such as ignore case, multi-line mode and other specific parameters: re.I ignore case re.L indicates the special character set \w,\W,\b,\B,\s, \S depends on the current environment re.M multi-line mode re.S means'. 'And any character including the newline character ('.' Does not include the newline character) re.U represents the special character set \w,\W,\ b,\B,\s,\S depend on the Unicode character attribute database re.X In order to increase readability, spaces and comments after the'#' are ignored |
re.I | Ignore case |
re.L | means that the special character set \w,\W,\b,\B,\s,\S depends on the current environment |
re.M | Multi-line mode |
re.S | means'. 'and any character including the newline character ('.' does not include the newline character) |
re.U | means that the special character set \w, \W, \b, \B, \s, \S depends on the Unicode character attribute database |
re.X | In order to increase readability, ignore spaces and comments after'#' |
Examples:
>>> import re
>>> pattern = re.compile(r'\d+')>>> m = pattern.match('one12twothree34four') #Find the head, no match
>>> print(m)
None
>>> m = pattern.match('one12twothree34four',2,10) #Match from the position of e, no match
>>> print(m)
None
>>> m = pattern.match('one12twothree34four',3,10)#Match from position 1, exactly
>>> print(m)#Return a Match object
<_ sre.SRE_Match object; span=(3,5), match='12'>>>> m.group(0)#Can be omitted 0'12'>>> m.start(0)#Can be omitted 03>>> m.end(0)#Can be omitted 05>>> m.span(0)#Can be omitted 0(3,5)
In the example, when the match is successful, a Match object is returned, where:
group([group1,...]) | Used to get one or more grouped matching strings, when you want to get the entire matched substring, you can directly use group() or group(0) |
---|---|
start([group]) | Used to get the starting position of the matched substring in the entire string (the index of the first character of the substring). The default is 0 |
end([group]) | Get the end position of the substring matched by the group in the entire string (the index of the last character of the substring +1) by default 0 |
span([group]) | return(start(group), end(group)) |
Example + continued
>>> import re
>>> pattern = re.compile(r'([a-z]+) ([a-z]+)',re.I)#re.I means ignore case
>>> m = pattern.match('hello world wide web')>>>print(m)#Match is successful, return a Match object
<_ sre.SRE_Match object; span=(0,11), match='hello world'>>>> m.group(0)#Return the entire string that matches successfully
' hello world'>>> m.span()#Returns the index of the entire substring that matches successfully(0,11)>>> m.group(1)#Return the first substring that matches successfully
' hello'>>> m.span(1)#Returns the index of the first substring matched successfully(0,5)>>> m.group(2)#Return the second substring that matches successfully
' world'>>> m.span(2)#Returns the index of the second substring matched successfully(6,11)>>> m.groups()#Equivalent to(m.group(1),m.group(2),...)('hello','world')>>> m.group(3)#There is no third group-error is reported
Traceback(most recent call last):
File "<pyshell#12>", line 1,in<module>
m.group(3)
IndexError: no such group
findall() function:
Find all substrings matched by the regular expression in the string, and return a list, if no match is found, return an empty list.
Note: match and search match once/findall match all.
Syntax format: findall( string[, pos[, endpos]])
parameter:
string | The string to be matched |
---|---|
pos | Optional parameter, specify the starting position of the string (default 0) |
endpos | optional parameter, specify the end position of the string, (default total length of the string) |
Examples:
>>> import re
>>> pattern = re.compile(r'\d+') #Find the number
>>> result1 = pattern.findall('runoob 123 google 456')>>> result2 = pattern.findall('run88oob123google456',0,10)>>>print(result1)['123','456']>>>print(result2)['88','12']
re.finditer() function:
-Similar to findall, find all substrings matched by the regular expression in the string and return them as an iterator.
Syntax format: re.finditer( pattern, string, flags=0)
parameter:
pattern | matched regular expression |
---|---|
string | The string to match |
flags | flags |
Examples:
>>> import re
>>> it = re.finditer(r'\d+','12a32bc43jf3')>>>for match in it:print(match.group())1232433
re.split() function:
The split method splits the string according to the substring that can be matched and returns the list list. The syntax is as follows:
re.split( pattern, string[,maxsplit=0, flags=0])
parameter:
pattern | matched regular expression |
---|---|
string | The string to match |
maxsplit | Split times, maxsplit=1>>Split once, the default is 0 Unlimited times |
flags | flags |
Examples:
>>> import re
>>> re.split('\W+','runoob, runoob, runoob.')['runoob','runoob','runoob','']>>> re.split('(\W+)',' runoob, runoob, runoob.')['',' ','runoob',', ','runoob',', ','runoob','.','']>>> re.split('\W+',' runoob, runoob, runoob.',1)['','runoob, runoob, runoob.']>>> re.split('a*','hello world') #For a string that cannot be matched, split will not split it
[' hello world']
Regular expression object:
· Re.compile() returns RegexObject object
·re.MatchObject
group() returns the string matched by RE
——Start(): Returns the position where the match starts
—— end(): Returns the position where the match ends
——Span(): Return a tuple containing the position of the match (start, end)
Regular expression modifiers-optional flags:
Regular expressions can contain some optional flag modifiers to control the matching pattern; the modifier is designated as an optional flag; multiple flags can be specified by bitwise OR (I) them. (For example, re.I | re.M is set to I and M signs):
Modifier | Function |
---|---|
re.I | Make matching case insensitive (ignore case) |
re.L | Do locale-aware matching |
re.M | Multi-line matching, affects ^ and $ |
re.S | Make. match all characters including newline |
re.U | Analyze characters according to the Unicode character set, this flag affects \w,\W,\b,\B |
re.X | This flag gives you a more flexible format so that you can write regular expressions easier to understand |
Regular expression pattern:
·The pattern string uses a special syntax to represent a regular expression;
· Letters and numbers represent themselves; letters and numbers in a regular expression pattern match the same string;
·Most letters and numbers have different meanings when you add a backslash before them;
·Punctuation marks match themselves only when they are escaped, otherwise they have a special meaning;
·The backslash itself needs to be escaped with a backslash;
· Since regular expressions usually contain backslashes, you'd better use raw strings to represent them.
·Pattern element: r'\t' is equivalent to \t to match the corresponding special characters;
·The following table lists the special elements in the re-expression pattern syntax. If you provide optional flag parameters while using a pattern, the meaning of some pattern elements will change:
Mode | Function |
---|---|
^ | Match the beginning of the string |
$ | matches any character, except for the newline character, when the DOTALL tag is specified, it can match any character including the newline character |
[...] | Used to represent a group of characters, listed separately: [amk] matches'a','m' or'k' |
[^...] | Characters not in []: [^abc] matches characters other than a, b, c |
re* | matches zero or more expressions |
re+ | matches one or more expressions |
re? | match 0 or 1 fragment defined by the previous regular expression, non-greedy way |
re{ n} | match n previous expressions (for example: "o{2}" cannot match the "o" in "Bob", but it can match the "o" in "food") |
re{ n,} | matches exactly n previous expressions. For example, "o{2,}" cannot match the "o" in "Bob", but it can match all o in "foooood". "o{1,}" is equivalent to "o+". "o{0,}" is equivalent to "o*". |
re{n,m} | match the fragment defined by the previous regular expression n to m times, greedy way |
a | b |
( re) | Matches the expression in parentheses, which also means a group |
(? imx) | Regular expression contains three optional flags: i, m or x-only affect the area in brackets |
(?- imx) | Regular expressions turn off i, m, or x optional flags |
(?: re) | Similar to (...), but does not indicate a group |
(? imx:re) | Use i, m, x optional flags in brackets |
(?- imx:re) | Do not use i, m, x optional flags in brackets |
(?#...) | Notes |
(?= re) | Forward positive delimiter. If the regular expression contained is represented by..., it succeeds when the current position is successfully matched, otherwise it fails. But once the contained expression has been tried, the matching engine has not improved at all; the rest of the pattern has to try the right side of the delimiter |
(?! re) | forward negation delimiter. Contrary to the positive delimiter; it succeeds when the contained expression cannot match at the current position of the string. |
(?> re) | The independent pattern of matching, eliminating the need for backtracking. |
\ w | Match numbers and letters underscore |
\ W | Matches non-numeric letters underscore |
\ s | matches any blank character (equivalent to \t \n \r \f) |
\ S | matches any non-empty character |
\ d | match any number (equivalent to [0-9]) |
\ D | match any non-digit |
\ A | Start of matching string |
\ Z | The end of the matching string (if there is a newline, only the ending character before the newline is matched) |
\ z | End of matching string |
\ G | Match the position where the last match was completed |
\ b | matches a word boundary, that is, the position between the word and the space (for example:'er\b' can match the'er' in "never" but not the'er' in "verb" |
\ B | Match non-word boundaries (for example:'er\B' can match the'er' in "verb", but cannot match the'er' in "never") |
\ n \t etc | matches a newline character, matches a tab character, etc. |
\1...\9 | Match the content of the nth group |
\10 | Match the content of the nth group, if it is matched; otherwise, it refers to the expression of the octal character code |
Examples of regular expressions:
Character matching:
[ Pp]ython | matches "Python" or "python" |
---|---|
rub[ye] | matches "ruby" or "rube" |
[ aeiou] | Match any letter in the brackets |
[0- 9] | Matches any number |
[ az] | match any lowercase letter |
[ AZ] | matches any uppercase letter |
[ a-zA-Z0-9] | match any number and letter |
[^ aeiou] | All characters except aeiou letters |
[^0- 9] | Matches characters other than digits |
Special character class:
. | Match any single character except "\n", use "[.\n]" pattern within "\n" |
---|---|
\ d | match a digit character |
\ D | match a non-digit character |
\ s | matches any whitespace character |
\ S | matches any non-whitespace character |
\ w | matches any word character that contains an underscore |
\ W | matches any non-word character |
Recommended Posts