Skip to content Skip to sidebar Skip to footer

How To Loop Through A Complicated Xml Structure In Order To Transform It To A Pandas Data Frame

I am trying to extract information from a XML file and transform it into a pandas dataframe for the following XML structure:
import xmltodict

withopen('change_user.xml') as fd:
    doc = xmltodict.parse(fd.read())  

doc['change']['log'] #use tags to maneuver through dicts

Prints:

OrderedDict([('@id', '333'),
             ('@action', 'create'),
             ('property',
              [OrderedDict([('@id', '52122'),
                            ('old', None),
                            ('new',
                             OrderedDict([('item',
                                           [OrderedDict([('@id', '562622'),
                                                         ('@toString',
                                                          'Test')]),
                                            OrderedDict([('@id', '033362'),
                                                     ('@toString',
                                                      'Test2')])])]))]),
           OrderedDict([('@id', '33563'),
                        ('new',
                         OrderedDict([('item',
                                       OrderedDict([('@id', '44322'),
                                                    ('@toString',
                                                     'Test3')]))]))]),
           OrderedDict([('@id', '21733'),
                        ('old', None),
                        ('new',
                         OrderedDict([('@id', '12341212'),
                                      ('@toString', 'Test4')]))])])])

Source: http://docs.python-guide.org/en/latest/scenarios/xml/

Solution 2:

this is way by which you can proceed further, i am taking example for two columns ,rest you can figure out yourself

Step 1

Parse the xml with ElementTree

import xml.etree.ElementTree as ET
import datetime as date

def output_xml_parsing(xml):
    xml_data=open(xml).read()
    root= ET.XML(xml_data)
    Change_User=root.attrib.get('user')
    timestamp=root.attrib.get('timestamp')
    return Change_User,timestamp

Step 2

Create a dataframe and add values to it,this example is with two columns only,but you can expand it further

defadd_data_to_dataframe(xml):
    import pandas as pd
    #This will create an empty dataframe with two columns
    report_dataframe=pd.DataFrame(columns=['Change_User','timestamp'],index=[date])
    #Returned value from above function would be stored in Change_user,timestamp
    Change_User,timestamp=output_xml_parsing(xml)

    #Dictionary which will populate the data in data frame, key is column name and value is value returned from previous function

   data={
        'Change_User':[Change_User],
        'timestamp':[timestamp]
        }
    #DataFrame would be populated by below command
    report_dataframe=pd.DataFrame(data,index=[date])
    return report_dataframe

Step 3

Calling the function

ab=add_data_to_dataframe(r'D:\Users\pankaj-m\Desktop\Stack overflow questions\xml\data.xml')
print ab

Post a Comment for "How To Loop Through A Complicated Xml Structure In Order To Transform It To A Pandas Data Frame"