Crawler Data Form Website Use Scrapy 1.5.0 - Python

I try to crawler data form a website with Scrapy (1.5.0)- Python Project directory : stack/ scrapy.cfg stack/

Solution 1:

Set User Agent

goto your scrapy projects

and paste this in,

USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'

Solution 2:

If you just want to crawl the website and get the Source Code, this might help.

import urllib.request as req

def imLS():
    url = ""
    data = req.Request(url)
    resp = req.urlopen(data)
    respData =

Solution 3:

Solution 4:

To parse each page you need to add a little bit code.

import re

from scrapy import Spider
from scrapy.selector import Selector

    name = "batdongsan"
    allowed_domains = ["<DOMAIN>"]
    start_urls = [

    defparse(self, response):
        questions = Selector(response).xpath('//div[@class="p-title"]/h3')

        # This part of code collect only titles. You need to add more fields to be collected if you need.for question in questions:
            title = question.xpath(
            yield {'title': title}

        ifnot'\d+', response.url):
            # Now we have to go th
            url_prefix = response.css('div.background-pager-right-controls a::attr(href)').extract_first()
            url_last = response.css('div.background-pager-right-controls a::attr(href)').extract()[-1]
            max = re.findall(r'\d+', url_last)[0]
            for n inrange(2, int(max)+1):
                next_page = url_prefix + '/p' + str(n)
                yield response.follow(next_page, callback=self.parse)

Replace to your domain. Also I didn't use Item class in my code.

