Skip to content Skip to sidebar Skip to footer

How To Properly Parse Parent/child Xml With Python

I have a XML parsing issue that I have been working on for the last few days and I just can't figure it out. I've used both the ElementTree built-in to Python as well as the LXML

Solution 1:

My solution without using xpath. What I recommend is digging a little further into lxml documentation. There might be more elegant and direct ways to achieve this. There's a lot to explore!.

Answer :

from lxml import etree
from io import BytesIO


classFindClasses(object):
    @staticmethoddefparse_xml(xml_string):
        parser = etree.XMLParser()
        fs = etree.parse(BytesIO(xml_string), parser)
        fstring = etree.tostring(fs, pretty_print=True)
        element = etree.fromstring(fstring)
        return element

    deffind(self, xml_string):
        for parent in self.parse_xml(xml_string).getiterator('connection'):
            for child in parent:
                if child.tag == 'id':
                    print child.text
                    self.find_classes(child)

    @staticmethoddeffind_classes(child):
        for parent in child.getparent():  # traversing up -> connectionfor children in parent.getchildren():  # children of connection -> classesfor child in children.getchildren():  # child of classes -> classprint child.text
        printif __name__ == '__main__':
    xml_file = open('foo.xml', 'rb')  #foo.xml or path to your xml file
    xml = xml_file.read()
    f = FindClasses()
    f.find(xml)

Output:

10
DVD
DVD_TEST

20
TV

Solution 2:

Your problem is with your xpath expression. It does not understand the logic from your nested for loop. The result of:

tree.xpath('./connections/connection/classes/class')

is a list of every element that follows that pattern provided to the xpath. In this case, all of your <class> elements follow this pattern are selected (this is actually the incredible power of xpath that it can select all of those nodes when you store your data this way).

Post a Comment for "How To Properly Parse Parent/child Xml With Python"