How To Properly Parse Parent/child Xml With Python
I have a XML parsing issue that I have been working on for the last few days and I just can't figure it out. I've used both the ElementTree built-in to Python as well as the LXML
Solution 1:
My solution without using xpath. What I recommend is digging a little further into lxml documentation. There might be more elegant and direct ways to achieve this. There's a lot to explore!.
Answer :
from lxml import etree
from io import BytesIO
classFindClasses(object):
@staticmethoddefparse_xml(xml_string):
parser = etree.XMLParser()
fs = etree.parse(BytesIO(xml_string), parser)
fstring = etree.tostring(fs, pretty_print=True)
element = etree.fromstring(fstring)
return element
deffind(self, xml_string):
for parent in self.parse_xml(xml_string).getiterator('connection'):
for child in parent:
if child.tag == 'id':
print child.text
self.find_classes(child)
@staticmethoddeffind_classes(child):
for parent in child.getparent(): # traversing up -> connectionfor children in parent.getchildren(): # children of connection -> classesfor child in children.getchildren(): # child of classes -> classprint child.text
printif __name__ == '__main__':
xml_file = open('foo.xml', 'rb') #foo.xml or path to your xml file
xml = xml_file.read()
f = FindClasses()
f.find(xml)
Output:
10
DVD
DVD_TEST
20
TV
Solution 2:
Your problem is with your xpath expression. It does not understand the logic from your nested for loop. The result of:
tree.xpath('./connections/connection/classes/class')
is a list of every element that follows that pattern provided to the xpath. In this case, all of your <class>
elements follow this pattern are selected (this is actually the incredible power of xpath that it can select all of those nodes when you store your data this way).
Post a Comment for "How To Properly Parse Parent/child Xml With Python"