Scrapy: What's The Correct Way To Use Start_requests()?

January 03, 2024 Post a Comment

This is how my spider is set up class CustomSpider(CrawlSpider): name = 'custombot' allowed_domains = ['www.domain.com'] start_urls = ['http://www.domain.com/some-url']

Solution 1:

From the documentation for start_requests, overriding start_requests means that the urls defined in start_urls are ignored.

This is the method called by Scrapy when the spider is opened for scraping when no particular URLs are specified. If particular URLs are specified, the make_requests_from_url() is used instead to create the Requests. [...] If you want to change the Requests used to start scraping a domain, this is the method to override.

If you want to just scrape from /some-url, then remove start_requests. If you want to scrape from both, then add /some-url to the start_urls list.

lacucinadiadine

Scrapy: What's The Correct Way To Use Start_requests()?

Solution 1:

Post a Comment for "Scrapy: What's The Correct Way To Use Start_requests()?"

Widget HTML #3