How To Find Text's Parent Node?

June 11, 2024 Post a Comment

If I use: import requests from lxml import html response = request.get(url='someurl') tree = html.document_fromstring(response.text) all_text = tree.xpath('//text()') # whic

Solution 1:

You can use getparent() method for this purpose, for example :

.....
.....
all_text = tree.xpath('//text()')

first_text = all_text[0]
parent_element = first_text.getparent()

print html.tostring(parent_element)

Note that the behavior of getparent()might not be the one you expected in case current text element located after element node in the same parent element. Due to the tree model implemented by lxml, the text is considered tail of the preceding element instead of child of the containing element in this case, so getparent() will return the preceding element. See example below to get a clear idea of what I've been talking about :

from lxml import html
raw = '''<div>
    <span>foo</span>
    bar
</div>'''
root = html.fromstring(raw)
texts = root.xpath('//text()[normalize-space()]')
print [t for t in texts]
# output : ['foo', '\n\tbar\n']

[html.tostring(e.getparent()) for e in texts]
# output : ['<span>foo</span>\n\tbar\n', '<span>foo</span>\n\tbar\n']# see that calling getparent() on 'bar' returns the <span> not the <div>

Baca Juga

lacucinadiadine

How To Find Text's Parent Node?

Solution 1:

Post a Comment for "How To Find Text's Parent Node?"

Widget HTML #3