Skip to content Skip to sidebar Skip to footer

Scraping A Json Response With Scrapy

How do you use Scrapy to scrape web requests that return JSON? For example, the JSON would look like this: { 'firstName': 'John', 'lastName': 'Smith', 'age': 25, 'a

Solution 1:

It's the same as using Scrapy's HtmlXPathSelector for html responses. The only difference is that you should use json module to parse the response:

classMySpider(BaseSpider):
    ...


    defparse(self, response):
         jsonresponse = json.loads(response.text)

         item = MyItem()
         item["firstName"] = jsonresponse["firstName"]             

         return item

Hope that helps.

Solution 2:

Don't need to use json module to parse the reponse object.

classMySpider(BaseSpider):
...


defparse(self, response):
     jsonresponse = response.json()

     item = MyItem()
     item["firstName"] = jsonresponse.get("firstName", "")           

     return item

Solution 3:

The possible reason JSON is not loading is that it has single-quotes before and after. Try this:

json.loads(response.body_as_unicode().replace("'", '"'))

Post a Comment for "Scraping A Json Response With Scrapy"