How Python parses XML

**What is XML? **

XML refers to eXtensible Markup Language, a subset of the standard general-purpose markup language, and is a markup language used to mark electronic documents to make them structured. You can learn XML tutorial through this site

XML is designed to transmit and store data.

XML is a set of rules that define semantic markup that divides documents into parts and identifies these parts.

It is also a meta-markup language, which defines a syntax language used to define other semantic and structured markup languages related to a specific field.

Python's parsing of XML

Common XML programming interfaces are DOM and SAX. These two interfaces deal with XML files in different ways, and of course they are used in different situations.

Python has three methods to parse XML, SAX, DOM, and ElementTree:

1. SAX (simple API for XML )

The Python standard library includes a SAX parser. SAX uses an event-driven model to process XML files by triggering events one by one during the process of parsing XML and calling user-defined callback functions.

2. DOM(Document Object Model)

Parse the XML data into a tree in memory, and manipulate the XML through operations on the tree.

The content of the XML example file movies.xml used in this chapter is as follows:

< collection shelf="New Arrivals"<movie title="Enemy Behind"<type War, Thriller</type 
 < format DVD</format 
 < year 2003</year 
 < rating PG</rating 
 < stars 10</stars 
 < description Talk about a US-Japan war</description 
< /movie 
< movie title="Transformers"<type Anime, Science Fiction</type 
 < format DVD</format 
 < year 1989</year 
 < rating R</rating 
 < stars 8</stars 
 < description A schientific fiction</description 
< /movie 
 < movie title="Trigun"<type Anime, Action</type 
 < format DVD</format 
 < episodes 4</episodes 
 < rating PG</rating 
 < stars 10</stars 
 < description Vash the Stampede!</description 
< /movie 
< movie title="Ishtar"<type Comedy</type 
 < format VHS</format 
 < rating PG</rating 
 < stars 2</stars 
 < description Viewable boredom</description 
< /movie 
< /collection 

Python uses SAX to parse xml

SAX is an event-driven API.

Using SAX to parse an XML document involves two parts: a parser and an event handler.

The parser is responsible for reading the XML document and sending events to the event handler, such as element start and element end events.

The event handler is responsible for responding to the event and processing the XML data passed.

To use sax to process xml in Python, you must first introduce the parse function in xml.sax and the ContentHandler in xml.sax.handler.

ContentHandler class method introduction

characters(content) method

When to call:

From the beginning of the line, before the label, there are characters, and the value of content is these strings.

From one label, before encountering the next label, there are characters, and the value of content is these strings.

From a label, there are characters before the end-of-line character is encountered, and the value of content is these strings.

The tag can be the start tag or the end tag.

startDocument() method

Called when the document is started.

endDocument() method

Called when the parser reaches the end of the document.

startElement(name, attrs) method

Called when XML start tag is encountered, name is the name of the tag, and attrs is the attribute value dictionary of the tag.

endElement(name) method

**Called when the XML closing tag is encountered. **

make_parser method

The following method creates a new parser object and returns.

xml.sax.make_parser([parser_list])

Parameter Description:

parser method

The following method creates a SAX parser and parses the xml document:

xml.sax.parse( xmlfile, contenthandler[, errorhandler])

Parameter Description:

parseString method

The parseString method creates an XML parser and parses the xml string:

xml.sax.parseString(xmlstring, contenthandler[, errorhandler])

Parameter Description:

Python parsing XML example

#! /usr/bin/python3

import xml.sax

classMovieHandler( xml.sax.ContentHandler ):
 def __init__(self):
 self.CurrentData =""
 self.type =""
 self.format =""
 self.year =""
 self.rating =""
 self.stars =""
 self.description =""

 # Element starts calling
 def startElement(self, tag, attributes):
 self.CurrentData = tag
 if tag =="movie":print("*****Movie*****")
 title = attributes["title"]print("Title:", title)

 # Element end call
 def endElement(self, tag):if self.CurrentData =="type":print("Type:", self.type)
 elif self.CurrentData =="format":print("Format:", self.format)
 elif self.CurrentData =="year":print("Year:", self.year)
 elif self.CurrentData =="rating":print("Rating:", self.rating)
 elif self.CurrentData =="stars":print("Stars:", self.stars)
 elif self.CurrentData =="description":print("Description:", self.description)
 self.CurrentData =""

 # Called when reading characters
 def characters(self, content):if self.CurrentData =="type":
 self.type = content
 elif self.CurrentData =="format":
 self.format = content
 elif self.CurrentData =="year":
 self.year = content
 elif self.CurrentData =="rating":
 self.rating = content
 elif self.CurrentData =="stars":
 self.stars = content
 elif self.CurrentData =="description":
 self.description = content
 
if( __name__ =="__main__"):
 
 # Create an XMLReader
 parser = xml.sax.make_parser()
 # Close namespace
 parser.setFeature(xml.sax.handler.feature_namespaces,0)

 # Override ContextHandler
 Handler =MovieHandler()
 parser.setContentHandler( Handler )
 
 parser.parse("movies.xml")

The execution result of the above code is as follows:

Movie
Title: Enemy Behind
Type: War, Thriller
Format: DVD
Year: 2003
Rating: PG
Stars: 10
Description: Talk about a US-Japan war
Movie
Title: Transformers
Type: Anime, Science Fiction
Format: DVD
Year: 1989
Rating: R
Stars: 8
Description: A schientific fiction
Movie
Title: Trigun
Type: Anime, Action
Format: DVD
Rating: PG
Stars: 10
Description: Vash the Stampede!
Movie
Title: Ishtar
Type: Comedy
Format: VHS
Rating: PG
Stars: 2
Description: Viewable boredom

For complete SAX API documentation, please refer to Python SAX APIs

Use xml.dom to parse xml

Document Object Model (DOM) is a standard programming interface for processing extensible markup language recommended by W3C organization.

When a DOM parser is parsing an XML document, it reads the entire document at one time and saves all the elements in the document in a tree structure in memory. Then you can use the different functions provided by the DOM to read or modify the document You can also write the modified content into the xml file.

In python, xml.dom.minidom is used to parse xml files. Examples are as follows:

#! /usr/bin/python3

from xml.dom.minidom import parse
import xml.dom.minidom

# Open the XML document with the minidom parser
DOMTree = xml.dom.minidom.parse("movies.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("shelf"):print("Root element : %s"% collection.getAttribute("shelf"))

# Get all movies in the collection
movies = collection.getElementsByTagName("movie")

# Print detailed information about each movie
for movie in movies:print("*****Movie*****")if movie.hasAttribute("title"):print("Title: %s"% movie.getAttribute("title"))

 type = movie.getElementsByTagName('type')[0]print("Type: %s"% type.childNodes[0].data)
 format = movie.getElementsByTagName('format')[0]print("Format: %s"% format.childNodes[0].data)
 rating = movie.getElementsByTagName('rating')[0]print("Rating: %s"% rating.childNodes[0].data)
 description = movie.getElementsByTagName('description')[0]print("Description: %s"% description.childNodes[0].data)

The execution results of the above program are as follows:

Root element : New Arrivals
Movie
Title: Enemy Behind
Type: War, Thriller
Format: DVD
Rating: PG
Description: Talk about a US-Japan war
Movie
Title: Transformers
Type: Anime, Science Fiction
Format: DVD
Rating: R
Description: A schientific fiction
Movie
Title: Trigun
Type: Anime, Action
Format: DVD
Rating: PG
Description: Vash the Stampede!
Movie
Title: Ishtar
Type: Comedy
Format: VHS
Rating: PG
Description: Viewable boredom

For complete DOM API documentation, please refer to Python DOM APIs.

The above is the details of how Python parses XML. For more information about Python's XML parsing, please pay attention to other related articles on ZaLou.Cn!

Recommended Posts

How Python parses XML
Python parses simple XML data
How does Python generate xml files
How python was invented
How to comment python code
How Python converts string case
How does Python output integers
How does python output backslashes
How to learn python quickly
How to uninstall python plugin
How Python implements FTP function
How to understand python objects
How to use python tuples
How does python update packages
Explain how python references package package
How does python perform matrix operations
How does Python list update value
python how to view webpage code
How to use python thread pool
How Python implements the mail function
How does python change the environment
How to write python configuration file
How to wrap in python code
How to omit parentheses in Python
How to install Python 3.8 on CentOS 8
How to install Python 3.8 on Ubuntu 18.04
How to write classes in python
How does python call java classes
How to read Excel in Python
How does python import dependency packages
How to solve python dict garbled
How to view errors in python
How does python enter interactive mode
How to write return in python
How Python operates on file directories
How Python implements the timer function
How to view the python module
How to understand variables in Python
How to understand python object-oriented programming
How does python determine prime numbers
How to use SQLite in Python
How to verify successful installation of python
How to make a globe with Python
How to use and and or in Python
How to delete cache files in python
How to introduce third-party modules in Python
How does python improve the calculation speed
How to represent null values in python
How to save text files in python
How to use PYTHON to crawl news articles
How to write win programs in python
How to run id function in python
How to install third-party modules in Python
How to custom catch errors in python
How to write try statement in python
How to define private attributes in Python
R&amp;D: How To Install Python 3 on CentOS 7
How does python call its own function
How to add custom modules in Python
How does Python handle the json module
How to process excel table with python