Scraping A Json Response With Scrapy
How do you use Scrapy to scrape web requests that return JSON? For example, the JSON would look like this: { 'firstName': 'John', 'lastName': 'Smith', 'age': 25, 'a
Solution 1:
It's the same as using Scrapy's HtmlXPathSelector
for html responses. The only difference is that you should use json
module to parse the response:
classMySpider(BaseSpider):
...
defparse(self, response):
jsonresponse = json.loads(response.text)
item = MyItem()
item["firstName"] = jsonresponse["firstName"]
return item
Hope that helps.
Solution 2:
Don't need to use json
module to parse the reponse object.
classMySpider(BaseSpider):
...
defparse(self, response):
jsonresponse = response.json()
item = MyItem()
item["firstName"] = jsonresponse.get("firstName", "")
return item
Solution 3:
The possible reason JSON is not loading is that it has single-quotes before and after. Try this:
json.loads(response.body_as_unicode().replace("'", '"'))
Post a Comment for "Scraping A Json Response With Scrapy"