Skip to content Skip to sidebar Skip to footer

How To Fetch Content Of Xml Root Element In Python?

I have an XML file, e.g.: First line. Second line. As an output I want to get: '\nFirst lin

Solution 1:

The first that I came up with:

from xml.etree.ElementTree import fromstring, tostring

source = '''<?xml version="1.0" encoding="UTF-8"?>
<root>
    First line.<br/>Second line.
</root>
'''

xml = fromstring(source)
result = tostring(xml).lstrip('<%s>' % xml.tag).rstrip('</%s>' % xml.tag)

print result

# output:##   First line.<br/>Second line. #

But it's not truly general-purpose approach since it fails if opening root element (<root>) contains any attribute.

UPDATE: This approach has another issue. Since lstrip and rstrip match any combination of given chars, you can face such problem:

# input:
<?xml version="1.0" encoding="UTF-8"?><root><p>First line</p></root>

# result:
p>First line</p

If your really need only literal string between the opening and closing tags (as you mentioned in the comment), you can use this:

from string import index, rindex
from xml.etree.ElementTree import fromstring, tostring

source = '''<?xml version="1.0" encoding="UTF-8"?>
<root attr1="val1">
    First line.<br/>Second line.
</root>
'''# following two lines are needed just to cut# declaration, doctypes, etc.
xml = fromstring(source)
xml_str = tostring(xml)

start = index(xml_str, '>')
end = rindex(xml_str, '<')

result = xml_str[start + 1 : -(len(xml_str) - end)]

Not the most elegant approach, but unlike the previous one it works correctly with attributes within opening tag as well as with any valid xml document.

Solution 2:

Parse from file:

from xml.etree.ElementTree import parse
tree = parse('yourxmlfile.xml')
print tree.getroot().text

Parse from string:

from xml.etree.ElementTree import fromstring
print fromstring(yourxmlstr).text

Post a Comment for "How To Fetch Content Of Xml Root Element In Python?"