Beautifulsoup Parsing Xml To Table
come back again with another issue. using BeautifulSoup really new in parsing XML , and have this problem since 2 weeks now. will appreciate your help have this structure :
Solution 1:
A list
comprehension of bloc elements with an embedded dict
comprehension of bloc attributes is the core. page by appending to dict
of bloc attributes, navigating to parent
and the required attribute.
Column order is based on order that they are seen
from bs4 import BeautifulSoup
xml = """<detail><pagenumber="01"><Bloccode="AF"A="000000000002550"B="000000000002550"/><Bloccode="AH"A="000000000035826"C="000000000035826"D="000000000035826"/><Bloccode="AR"A="000000000026935"B="000000000024503"C="000000000002431"D="000000000001669"/></page><pagenumber="02"><Bloccode="DA"A="000000000038486"B="000000000038486"/><Bloccode="DD"A="000000000003849"B="000000000003849"/><Bloccode="EA"A="000000000001029"/><Bloccode="EC"A="000000000063797"B="000000000082427"/></page><pagenumber="03"><Bloccode="FD"C="000000000574042"D="000000000610740"/><Bloccode="GW"C="000000000052677"D="000000000075362"/></page></detail>"""
soup = BeautifulSoup(xml)
df = pd.DataFrame([{**{k:b[k] for k in b.attrs.keys()},**{"page":b.parent["number"]}}
for b in soup.find_all("bloc")])
output
code a b page c d
AF 00000000000255000000000000255001NaNNaN
AH 000000000035826NaN01000000000035826000000000035826
AR 00000000002693500000000002450301000000000002431000000000001669
DA 00000000003848600000000003848602NaNNaN
DD 00000000000384900000000000384902NaNNaN
EA 000000000001029NaN02NaNNaN
EC 00000000006379700000000008242702NaNNaN
FD NaNNaN03000000000574042000000000610740
GW NaNNaN03000000000052677000000000075362
elementtree
Very similar to BeautifulSoup
import xml.etree.ElementTree as ET
root = ET.fromstring(xml)
df2 = pd.DataFrame([{**b.attrib, **{"page":p.attrib["number"]}}
for p in root.iter("page")
for b in p.iter("Bloc") ])
Post a Comment for "Beautifulsoup Parsing Xml To Table"