Scrapy: What's The Correct Way To Use Start_requests()?
This is how my spider is set up class CustomSpider(CrawlSpider): name = 'custombot' allowed_domains = ['www.domain.com'] start_urls = ['http://www.domain.com/some-url']
Solution 1:
From the documentation for start_requests
, overriding start_requests
means that the urls defined in start_urls
are ignored.
This is the method called by Scrapy when the spider is opened for scraping when no particular URLs are specified. If particular URLs are specified, the make_requests_from_url() is used instead to create the Requests. [...] If you want to change the Requests used to start scraping a domain, this is the method to override.
If you want to just scrape from /some-url, then remove start_requests
. If you want to scrape from both, then add /some-url to the start_urls
list.
Post a Comment for "Scrapy: What's The Correct Way To Use Start_requests()?"