How Do I Extract Text Data In First Column From Wikipedia Table?
I have been trying to develop a scraper class which will read Wikipedia data into a JSON file. I need to be able to read tables, extract links from the first columns, retrieve info
Solution 1:
The following script should fetch you the required data from it's first column from that table. I've used hardcoded index at the end of .find_all()
to avoid getting the headers. Give it a shot:
import requests
from bs4 import BeautifulSoup
url = "https://simple.wikipedia.org/wiki/List_of_countries"
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for items in soup.find(class_="wikitable").find_all("tr")[1:]:
data = items.find("td").get_text(strip=True)
print(data)
Post a Comment for "How Do I Extract Text Data In First Column From Wikipedia Table?"