Parse/Read XML Document in Python

XML represents Extensible Markup Language and like HTML, it is likewise a markup language. In XML, in any case, we don't utilize predefined labels however here we can utilize our own custom labels dependent on the information we are putting away in the XML record.

A XML record is frequently used to share, store, and design information since it can without much of a stretch be moved among servers and frameworks. We as a whole know with regards to information, Python is one of the most outstanding programming languages to measure and parse it.

Fortunately, Python accompanies a Standard XML module that can parse XML records in Python and furthermore compose information in the XML document. This is called Python XML Parser.

In this Python XML tutorial, we will stroll through the Python XML minidom and ElemetnTree modules, and figure out how to parse a XML document in Python.

Read XML Document in Python using minidom

minidom is the submodule of the Python standard XML module, which means you do not have to pip install XML to use minidom.

The minidom module parses the XML document in a Document Object Model(DOM), whose data can further be extracted using the getElemetsByTagName()function.

Syntax:

from xml.dom import minidom

minidom.parse("filename")

Example:

Let’s grab all the names and phone data from our demo.xml file.

from xml.dom import minidom


#parse xml file
file = minidom.parse('demo.xml')

#grab all <record> tags
records = file.getElementsByTagName("record")

print("Name------>Phone")

for record in records:
    #access <name> and <phone> node of every record
    name = record.getElementsByTagName("name")
    phone = record.getElementsByTagName("phone")

    #access data of name and phone
    print(name[0].firstChild.data, end="----->")
    print(phone[0].firstChild.data)

Output

Name------>Phone
Jameson----->(080) 78168241
Colton----->(026) 53458662
Dillon----->(051) 96790901
Channing----->(014) 98829753

Then we parse our demo.xml file with file = minidom.parse('demo.xml')statement. The parse() function parses the XML document in a model node object with the root node.

Note: “Our Python script and the demo.xml file are located at the same location that’s why we only specify the file name demo.txtin the minidom.parse()function. If your Python script and xml file are located at different locations, then you have to specify the absolute or relative path of the file.”

After passing the XML file in our Python program we accessed all the nodes using the records = file.getElementsByTagName("record")statement.

The getElementsByTagName()is the minidom object function which returns a node objects of the specified tag.

Once we had all the record nodes, we loop through those nodes, and again using the getElementsByTagName() function we accessed its nested and nodes.

Next, after accessing the individual name and phone node we printed their data using name[0].firstChild.dataand phone[0].firstChild.datastatement.

The firstChild.datais the property of every node, by which we can access the text data of a specific node object.

Conclusion

That summarizes this instructional exercise on Python XML Parser. As should be obvious, Python gives an inbuild Standard xml module to peruse and parse XML records in Python. It by and large has 2 submodules that can parse a XML document:

  • minidom and

  • ElementTree

The minidom module follows the Document Object Model way to deal with parse a XML record. Then again, the ElementTree module follows the tree-like construction to parse the XML document.

29