My Script Doesn't Seems To Go To The Next Pages And Doesn't Scrape All The Data I Would Like

March 07, 2024 Post a Comment

Here's my script (I didn't put all the code for the sake of clarity but I will explained in details some aspect) : from selenium import webdriver import time from selenium.webdr

Solution 1:

You need to change your URL part row=25 get all rows in HTML

import requests
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"}
res=requests.get("https://www.booking.com/reviewlist.fr.html?cc1=fr&dist=1&pagename=hotelistria&type=total&offset=0&rows=25",headers=headers)

soup = BeautifulSoup(res.text, "html.parser")

reviews = soup.find_all('li', class_ = "review_list_new_item_block")

Output:

len(reviews)

25

Above code is for one but below code is for 61 pages for that i have find out first page and last page offset value and based on that it extract reviews

import requests



deffind_page_val():
    headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"}
    res=requests.get(f"https://www.booking.com/reviewlist.fr.html?cc1=fr&dist=1&pagename=hotelistria&type=total&offset=0&rows=25",headers=headers)
    soup = BeautifulSoup(res.text, "html.parser")
    first=int(soup.find("div", class_="bui-pagination__pages").find_all("div",class_="bui-pagination__item")[1].find("a")['href'].split("=")[-1])
    last=int(soup.find("div", class_="bui-pagination__pages").find_all("div",class_="bui-pagination__item")[-1].find("a")['href'].split("=")[-1])
    total=first+last
    return first,total
    
defconnection(i):
    headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"}
    res=requests.get(f"https://www.booking.com/reviewlist.fr.html?cc1=fr&dist=1&pagename=hotelistria&type=total&offset={i}&rows=25",headers=headers)
    soup = BeautifulSoup(res.text, "html.parser")
    return soup


defget_list_reviews():
    first,total=find_page_val()
    for i inrange(0,total,first):
        soup=connection(i)
        reviews =soup.find_all('li', class_ = "review_list_new_item_block")
        print(len(reviews))

in last call get_list_reviews() this function it gives ouptut

Output:

25
25
25
...

lacucinadiadine

My Script Doesn't Seems To Go To The Next Pages And Doesn't Scrape All The Data I Would Like

Solution 1:

Post a Comment for "My Script Doesn't Seems To Go To The Next Pages And Doesn't Scrape All The Data I Would Like"

Widget HTML #3