Skip to content Skip to sidebar Skip to footer

How To Append Items From Scrapy Spider To List?

I'm using a basic spider that gets particular information from links on a website. My code looks like this: import sys from scrapy import Request import urllib.parse as urlparse

Solution 1:

To save results you should use scrapy's Feed Exporters feature as described in the documentation here

One of the most frequently required features when implementing scrapers is being able to store the scraped data properly and, quite often, that means generating an “export file” with the scraped data (commonly called “export feed”) to be consumed by other systems.

Scrapy provides this functionality out of the box with the Feed Exports, which allows you to generate a feed with the scraped items, using multiple serialization formats and storage backends.

See the csv section for your case.

Another, more custom, approach would be using scrapy's Item Pipelines. There's an example of simple json writer here that could be easily modified to output csv or any other format.

For example this piece of code would output all items to an test.csv file in project directory:

import scrapy
classMySpider(scrapy.Spider):
    name = 'feed_exporter_test'# this is equivalent to what you would set in settings.py file
    custom_settings = {
        'FEED_FORMAT': 'csv',
        'FEED_URI': 'test.csv'
    }
    start_urls = ['http://stackoverflow.com/questions/tagged/scrapy']

    defparse(self, response):
        titles = response.xpath("//a[@class='question-hyperlink']/text()").extract()
        for i, title inenumerate(titles):
            yield {'index': i, 'title': title}

This example generates 50 row long csv file.

Post a Comment for "How To Append Items From Scrapy Spider To List?"