Navigating Through Pagination With Selenium In Python

May 24, 2024 Post a Comment

I'm scraping this website using Python and Selenium. I have the code working but it currently only scrapes the first page, I would like to iterate through all the pages and scrape

Solution 1:

Before moving on to automating any scenario, always write down the manual steps you would perform to execute the scenario. Manual steps for what you want to (which I understand from the question) is -

1) Go to site - https://services.wiltshire.gov.uk/PlanningGIS/LLPG/WeeklyList

2) Select first week option

3) Click search

4) Get the data from every page

5) Load the url again

6) Select second week option

7) Click search

8) Get the data from every page

.. and so on.

You are having a loop to select different weeks but inside each loop iteration for the week option, you also need to include a loop to iterate over all the pages. Since you are not doing that, your code is returning only the data from the first page.

Another problem is with how you are locaing the 'Next' button -

driver.find_element_by_xpath('//*[@id="form1"]/div[3]/a[4]').click()

You are selecting the 4th <a> element which is ofcourse not robust because in different pages, the Next button's index will be different. Instead, use this better locator -

driver.find_element_by_xpath("//a[contains(text(),'Next')]").click()

Logic for creating loop which will iterate through pages -

First you will need the number of pages. I did that by locating the <a>immediately before the "Next" button. As per the screenshot below, it is clear that the text of this element will be equal to the number of pages -

I did that using following code -

number_of_pages = int(driver.find_element_by_xpath("//a[contains(text(),'Next')]/preceding-sibling::a[1]").text)

Now once you have number of pages as number_of_pages, you only need to click "Next" button number_of_pages - 1 times!

Final code for your main function-

def main():
 all_data = []
 select = Select(driver.find_element_by_xpath("//select[@class='formitem' and @id='selWeek']"))
 list_options = select.options

 for item in range(len(list_options)):
    select = Select(driver.find_element_by_xpath("//select[@class='formitem' and @id='selWeek']"))
    select.select_by_index(str(item))
    driver.find_element_by_css_selector("input.formbutton#csbtnSearch").click()
    number_of_pages = int(driver.find_element_by_xpath("//a[contains(text(),'Next')]/preceding-sibling::a[1]").text)
    for j in range(number_of_pages - 1):
      all_data.extend(getData())
      driver.find_element_by_xpath("//a[contains(text(),'Next')]").click()
      time.sleep(1)
    driver.get(url)

 with open( 'wiltshire.json', 'w+' ) as f:
    json.dump( all_data, f )
 driver.quit()

Solution 2:

first get the total pages in the pagination, using

ins.get('https://services.wiltshire.gov.uk/PlanningGIS/LLPG/WeeklyList/10702380,1')
ins.find_element_by_class_name("pagination")
source = BeautifulSoup(ins.page_source)
div = source.find_all('div', {'class':'pagination'})
all_as = div[0].find_all('a')
total = 0for i in range(len(all_as)):
    if'Next' in all_as[i].text:
        total = all_as[i-1].text
        break

Now just loop through the range

for i in range(total):
 ins.get('https://services.wiltshire.gov.uk/PlanningGIS/LLPG/WeeklyList/10702380,{}'.format(count))

keep incrementing the count and get the source code for the page and then get the data for it. Note: Don't forget the sleep when clicking on going form one page to another

Solution 3:

Following approach is simply worked for me.

driver.find_element_by_link_text("3").click()
driver.find_element_by_link_text("4").click()
....
driver.find_element_by_link_text("Next").click()

lacucinadiadine

Navigating Through Pagination With Selenium In Python

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Navigating Through Pagination With Selenium In Python"

Widget HTML #3