My Script Doesn't Seems To Go To The Next Pages And Doesn't Scrape All The Data I Would Like
Here's my script (I didn't put all the code for the sake of clarity but I will explained in details some aspect) : from selenium import webdriver import time from selenium.webdr
Solution 1:
You need to change your URL part row=25 get all rows in HTML
import requests
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"}
res=requests.get("https://www.booking.com/reviewlist.fr.html?cc1=fr&dist=1&pagename=hotelistria&type=total&offset=0&rows=25",headers=headers)
soup = BeautifulSoup(res.text, "html.parser")
reviews = soup.find_all('li', class_ = "review_list_new_item_block")
Output:
len(reviews)
25
Above code is for one but below code is for 61 pages for that i have find out first page and last page offset
value and based on that it extract reviews
import requests
deffind_page_val():
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"}
res=requests.get(f"https://www.booking.com/reviewlist.fr.html?cc1=fr&dist=1&pagename=hotelistria&type=total&offset=0&rows=25",headers=headers)
soup = BeautifulSoup(res.text, "html.parser")
first=int(soup.find("div", class_="bui-pagination__pages").find_all("div",class_="bui-pagination__item")[1].find("a")['href'].split("=")[-1])
last=int(soup.find("div", class_="bui-pagination__pages").find_all("div",class_="bui-pagination__item")[-1].find("a")['href'].split("=")[-1])
total=first+last
return first,total
defconnection(i):
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"}
res=requests.get(f"https://www.booking.com/reviewlist.fr.html?cc1=fr&dist=1&pagename=hotelistria&type=total&offset={i}&rows=25",headers=headers)
soup = BeautifulSoup(res.text, "html.parser")
return soup
defget_list_reviews():
first,total=find_page_val()
for i inrange(0,total,first):
soup=connection(i)
reviews =soup.find_all('li', class_ = "review_list_new_item_block")
print(len(reviews))
in last call get_list_reviews()
this function it gives ouptut
Output:
25
25
25
...
Post a Comment for "My Script Doesn't Seems To Go To The Next Pages And Doesn't Scrape All The Data I Would Like"