How To Loop Through A Complicated Xml Structure In Order To Transform It To A Pandas Data Frame
I am trying to extract information from a XML file and transform it into a pandas dataframe for the following XML structure:
import xmltodict
withopen('change_user.xml') as fd:
doc = xmltodict.parse(fd.read())
doc['change']['log'] #use tags to maneuver through dicts
Prints:
OrderedDict([('@id', '333'),
('@action', 'create'),
('property',
[OrderedDict([('@id', '52122'),
('old', None),
('new',
OrderedDict([('item',
[OrderedDict([('@id', '562622'),
('@toString',
'Test')]),
OrderedDict([('@id', '033362'),
('@toString',
'Test2')])])]))]),
OrderedDict([('@id', '33563'),
('new',
OrderedDict([('item',
OrderedDict([('@id', '44322'),
('@toString',
'Test3')]))]))]),
OrderedDict([('@id', '21733'),
('old', None),
('new',
OrderedDict([('@id', '12341212'),
('@toString', 'Test4')]))])])])
Source: http://docs.python-guide.org/en/latest/scenarios/xml/
Solution 2:
this is way by which you can proceed further, i am taking example for two columns ,rest you can figure out yourself
Step 1
Parse the xml with ElementTree
import xml.etree.ElementTree as ET
import datetime as date
def output_xml_parsing(xml):
xml_data=open(xml).read()
root= ET.XML(xml_data)
Change_User=root.attrib.get('user')
timestamp=root.attrib.get('timestamp')
return Change_User,timestamp
Step 2
Create a dataframe and add values to it,this example is with two columns only,but you can expand it further
defadd_data_to_dataframe(xml):
import pandas as pd
#This will create an empty dataframe with two columns
report_dataframe=pd.DataFrame(columns=['Change_User','timestamp'],index=[date])
#Returned value from above function would be stored in Change_user,timestamp
Change_User,timestamp=output_xml_parsing(xml)
#Dictionary which will populate the data in data frame, key is column name and value is value returned from previous function
data={
'Change_User':[Change_User],
'timestamp':[timestamp]
}
#DataFrame would be populated by below command
report_dataframe=pd.DataFrame(data,index=[date])
return report_dataframe
Step 3
Calling the function
ab=add_data_to_dataframe(r'D:\Users\pankaj-m\Desktop\Stack overflow questions\xml\data.xml')
print ab
Post a Comment for "How To Loop Through A Complicated Xml Structure In Order To Transform It To A Pandas Data Frame"