How To Find Text's Parent Node?
If I use: import requests from lxml import html response = request.get(url='someurl') tree = html.document_fromstring(response.text) all_text = tree.xpath('//text()') # whic
Solution 1:
You can use getparent()
method for this purpose, for example :
.....
.....
all_text = tree.xpath('//text()')
first_text = all_text[0]
parent_element = first_text.getparent()
print html.tostring(parent_element)
Note that the behavior of getparent()
might not be the one you expected in case current text element located after element node in the same parent element. Due to the tree model implemented by lxml
, the text is considered tail
of the preceding element instead of child
of the containing element in this case, so getparent()
will return the preceding element. See example below to get a clear idea of what I've been talking about :
from lxml import html
raw = '''<div>
<span>foo</span>
bar
</div>'''
root = html.fromstring(raw)
texts = root.xpath('//text()[normalize-space()]')
print [t for t in texts]
# output : ['foo', '\n\tbar\n']
[html.tostring(e.getparent()) for e in texts]
# output : ['<span>foo</span>\n\tbar\n', '<span>foo</span>\n\tbar\n']# see that calling getparent() on 'bar' returns the <span> not the <div>
Post a Comment for "How To Find Text's Parent Node?"