How To Get Immediate Parent Node With Scrapy In Python?

Question

I am new to scrapy. I want to crawl some data from the web. I got the html document like below. dom style1:

so

Solution 1:

With xpath you can traverse the xml tree in every direction(parent, sibling, child etc.) where css doesn't support this. For your case you can get node's parent with xpath .. parent notation:

//p[@class='user-name']/../text()

Explanation: //p[@class='user-name'] - find <p> nodes with class value user-name. /.. - select node's parent. /text() - select text of the current node.

This xpath should work in both of your described cases.

Solution 2:

What about using following-sibling axis?

>>>s = scrapy.Selector(text='''<div class="user-info">...    <p class="user-name">...        something in p tag...    </p>...    text data I want...</div>''')>>>username = s.css('p.user-name')[0]>>>username.xpath('following-sibling::text()[1]').get()
'\n    text data I want\n'
>>>>>>s2 = scrapy.Selector(text='''<div class="user-info">...    <div>...        <p class="user-img">...            something in p tag...        </p>...        something in div tag...    </div>...    <div>...        <p class="user-name">...            something in p tag...        </p>...        text data I want...    </div>...</div>''')>>>username = s2.css('p.user-name')[0]>>>username.xpath('following-sibling::text()[1]').get()
'\n        text data I want\n    '
>>>

lacucinadiadine

How To Get Immediate Parent Node With Scrapy In Python?

Solution 1:

Solution 2:

Post a Comment for "How To Get Immediate Parent Node With Scrapy In Python?"

Widget HTML #3