**What is XML? **
XML refers to eXtensible Markup Language, a subset of the standard general-purpose markup language, and is a markup language used to mark electronic documents to make them structured. You can learn XML tutorial through this site
XML is designed to transmit and store data.
XML is a set of rules that define semantic markup that divides documents into parts and identifies these parts.
It is also a meta-markup language, which defines a syntax language used to define other semantic and structured markup languages related to a specific field.
Python's parsing of XML
Common XML programming interfaces are DOM and SAX. These two interfaces deal with XML files in different ways, and of course they are used in different situations.
Python has three methods to parse XML, SAX, DOM, and ElementTree:
1. SAX (simple API for XML )
The Python standard library includes a SAX parser. SAX uses an event-driven model to process XML files by triggering events one by one during the process of parsing XML and calling user-defined callback functions.
2. DOM(Document Object Model)
Parse the XML data into a tree in memory, and manipulate the XML through operations on the tree.
The content of the XML example file movies.xml used in this chapter is as follows:
< collection shelf="New Arrivals"<movie title="Enemy Behind"<type War, Thriller</type
< format DVD</format
< year 2003</year
< rating PG</rating
< stars 10</stars
< description Talk about a US-Japan war</description
< /movie
< movie title="Transformers"<type Anime, Science Fiction</type
< format DVD</format
< year 1989</year
< rating R</rating
< stars 8</stars
< description A schientific fiction</description
< /movie
< movie title="Trigun"<type Anime, Action</type
< format DVD</format
< episodes 4</episodes
< rating PG</rating
< stars 10</stars
< description Vash the Stampede!</description
< /movie
< movie title="Ishtar"<type Comedy</type
< format VHS</format
< rating PG</rating
< stars 2</stars
< description Viewable boredom</description
< /movie
< /collection
Python uses SAX to parse xml
SAX is an event-driven API.
Using SAX to parse an XML document involves two parts: a parser and an event handler.
The parser is responsible for reading the XML document and sending events to the event handler, such as element start and element end events.
The event handler is responsible for responding to the event and processing the XML data passed.
To use sax to process xml in Python, you must first introduce the parse function in xml.sax and the ContentHandler in xml.sax.handler.
ContentHandler class method introduction
characters(content) method
When to call:
From the beginning of the line, before the label, there are characters, and the value of content is these strings.
From one label, before encountering the next label, there are characters, and the value of content is these strings.
From a label, there are characters before the end-of-line character is encountered, and the value of content is these strings.
The tag can be the start tag or the end tag.
startDocument() method
Called when the document is started.
endDocument() method
Called when the parser reaches the end of the document.
startElement(name, attrs) method
Called when XML start tag is encountered, name is the name of the tag, and attrs is the attribute value dictionary of the tag.
endElement(name) method
**Called when the XML closing tag is encountered. **
make_parser method
The following method creates a new parser object and returns.
xml.sax.make_parser([parser_list])
Parameter Description:
parser method
The following method creates a SAX parser and parses the xml document:
xml.sax.parse( xmlfile, contenthandler[, errorhandler])
Parameter Description:
parseString method
The parseString method creates an XML parser and parses the xml string:
xml.sax.parseString(xmlstring, contenthandler[, errorhandler])
Parameter Description:
Python parsing XML example
#! /usr/bin/python3
import xml.sax
classMovieHandler( xml.sax.ContentHandler ):
def __init__(self):
self.CurrentData =""
self.type =""
self.format =""
self.year =""
self.rating =""
self.stars =""
self.description =""
# Element starts calling
def startElement(self, tag, attributes):
self.CurrentData = tag
if tag =="movie":print("*****Movie*****")
title = attributes["title"]print("Title:", title)
# Element end call
def endElement(self, tag):if self.CurrentData =="type":print("Type:", self.type)
elif self.CurrentData =="format":print("Format:", self.format)
elif self.CurrentData =="year":print("Year:", self.year)
elif self.CurrentData =="rating":print("Rating:", self.rating)
elif self.CurrentData =="stars":print("Stars:", self.stars)
elif self.CurrentData =="description":print("Description:", self.description)
self.CurrentData =""
# Called when reading characters
def characters(self, content):if self.CurrentData =="type":
self.type = content
elif self.CurrentData =="format":
self.format = content
elif self.CurrentData =="year":
self.year = content
elif self.CurrentData =="rating":
self.rating = content
elif self.CurrentData =="stars":
self.stars = content
elif self.CurrentData =="description":
self.description = content
if( __name__ =="__main__"):
# Create an XMLReader
parser = xml.sax.make_parser()
# Close namespace
parser.setFeature(xml.sax.handler.feature_namespaces,0)
# Override ContextHandler
Handler =MovieHandler()
parser.setContentHandler( Handler )
parser.parse("movies.xml")
The execution result of the above code is as follows:
Movie
Title: Enemy Behind
Type: War, Thriller
Format: DVD
Year: 2003
Rating: PG
Stars: 10
Description: Talk about a US-Japan war
Movie
Title: Transformers
Type: Anime, Science Fiction
Format: DVD
Year: 1989
Rating: R
Stars: 8
Description: A schientific fiction
Movie
Title: Trigun
Type: Anime, Action
Format: DVD
Rating: PG
Stars: 10
Description: Vash the Stampede!
Movie
Title: Ishtar
Type: Comedy
Format: VHS
Rating: PG
Stars: 2
Description: Viewable boredom
For complete SAX API documentation, please refer to Python SAX APIs
Use xml.dom to parse xml
Document Object Model (DOM) is a standard programming interface for processing extensible markup language recommended by W3C organization.
When a DOM parser is parsing an XML document, it reads the entire document at one time and saves all the elements in the document in a tree structure in memory. Then you can use the different functions provided by the DOM to read or modify the document You can also write the modified content into the xml file.
In python, xml.dom.minidom is used to parse xml files. Examples are as follows:
#! /usr/bin/python3
from xml.dom.minidom import parse
import xml.dom.minidom
# Open the XML document with the minidom parser
DOMTree = xml.dom.minidom.parse("movies.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("shelf"):print("Root element : %s"% collection.getAttribute("shelf"))
# Get all movies in the collection
movies = collection.getElementsByTagName("movie")
# Print detailed information about each movie
for movie in movies:print("*****Movie*****")if movie.hasAttribute("title"):print("Title: %s"% movie.getAttribute("title"))
type = movie.getElementsByTagName('type')[0]print("Type: %s"% type.childNodes[0].data)
format = movie.getElementsByTagName('format')[0]print("Format: %s"% format.childNodes[0].data)
rating = movie.getElementsByTagName('rating')[0]print("Rating: %s"% rating.childNodes[0].data)
description = movie.getElementsByTagName('description')[0]print("Description: %s"% description.childNodes[0].data)
The execution results of the above program are as follows:
Root element : New Arrivals
Movie
Title: Enemy Behind
Type: War, Thriller
Format: DVD
Rating: PG
Description: Talk about a US-Japan war
Movie
Title: Transformers
Type: Anime, Science Fiction
Format: DVD
Rating: R
Description: A schientific fiction
Movie
Title: Trigun
Type: Anime, Action
Format: DVD
Rating: PG
Description: Vash the Stampede!
Movie
Title: Ishtar
Type: Comedy
Format: VHS
Rating: PG
Description: Viewable boredom
For complete DOM API documentation, please refer to Python DOM APIs.
The above is the details of how Python parses XML. For more information about Python's XML parsing, please pay attention to other related articles on ZaLou.Cn!
Recommended Posts