Skip to content Skip to sidebar Skip to footer

How Do I Extract Text Data In First Column From Wikipedia Table?

I have been trying to develop a scraper class which will read Wikipedia data into a JSON file. I need to be able to read tables, extract links from the first columns, retrieve info

Solution 1:

The following script should fetch you the required data from it's first column from that table. I've used hardcoded index at the end of .find_all() to avoid getting the headers. Give it a shot:

import requests
from bs4 import BeautifulSoup

url = "https://simple.wikipedia.org/wiki/List_of_countries"

res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for items in soup.find(class_="wikitable").find_all("tr")[1:]:
    data = items.find("td").get_text(strip=True)
    print(data)

Post a Comment for "How Do I Extract Text Data In First Column From Wikipedia Table?"