Skip to content Skip to sidebar Skip to footer

Parse Through An Xml In Python

I am looking to parse through the following xml-http://charts.realclearpolitics.com/charts/1044.xml. I want to have the result in a data frame with 3 columns: Date, Approve, Disapp

Solution 1:

Here is one way to do it with lxml and XPath:

from lxml import etree
import pandas as pd

tree = etree.parse("http://charts.realclearpolitics.com/charts/1044.xml")

date = [s.text for s in tree.xpath("series/value")]
approve = [float(s.text) if s.text else0.0for s in tree.xpath("graphs/graph[@title='Approve']/value")]
disapprove = [float(s.text) if s.text else0.0for s in tree.xpath("graphs/graph[@title='Disapprove']/value")]

assertlen(date) == len(approve) == len(disapprove)

finalresult = pd.DataFrame({'Date': date, 'Approve': approve, 'Disapprove': disapprove})
print finalresult

Output:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1727 entries, 0to1726
Data columns (total 3 columns):
Date1727  non-nullvalues
Approve       1727  non-nullvalues
Disapprove    1727  non-nullvalues
dtypes: float64(2), object(1)

Post a Comment for "Parse Through An Xml In Python"