Python Beautiful Soup Can't Find Specific Table
Solution 1:
As Jarett mentioned above, BeautifulSoup can't parse your tag. In this case it's because it's commented out in the source. While this is admittedly an amateurish approach, it works for your data.
table_src = html.text.split('<div class="overthrow table_container"
id="div_team-stats-per_game">')[1].split('</table>')[0] + '</table>'table = BeautifulSoup(table_src, 'lxml')
Solution 2:
The tables are rendered after, so you'd need to use Selenium to let it render or as mentioned above. But that isn't necessary as most of the tables are within the comments. You could use BeautifulSoup to pull out the comments, then search through those for the table tags.
import requests
from bs4 import BeautifulSoup
from bs4 import Comment
import pandas as pd
#NBA season
year = 2019
url = 'https://www.basketball-reference.com/leagues/NBA_2019.html#all_team-stats-base'.format(year)
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
tables = []
for each in comments:
if'table'in each:
try:
tables.append(pd.read_html(each)[0])
except:
continue
This will return you a list of dataframes, so just pull out the table you want from wherever it is located by its index position:
Output:
print(tables[3])RkTeamGMPFG...STLBLKTOVPFPTS01.0MilwaukeeBucks*82197803555...6154861137 1608 968612.0GoldenStateWarriors*82198053612...6255251169 1757 965023.0NewOrleansPelicans82197553581...6104411215 1732 946634.0Philadelphia76ers*82198053407...6064321223 1745 944545.0LosAngelesClippers*82198303384...5613851193 1913 944256.0PortlandTrailBlazers*82198553470...5464131135 1669 940267.0OklahomaCityThunder*82198553497...7664251145 1839 938778.0TorontoRaptors*82198803460...6804371150 1724 938489.0SacramentoKings82197303541...6793631095 1751 9363910.0WashingtonWizards82199303456...6833791154 1701 93501011.0HoustonRockets*82198303218...7004051094 1803 93411112.0AtlantaHawks82198553392...6754191397 1932 92941213.0MinnesotaTimberwolves82198303413...6834111074 1664 92231314.0BostonCeltics*82197803451...7064351052 1670 92161415.0BrooklynNets*82199803301...5393391236 1763 92041516.0LosAngelesLakers82197803491...6184401284 1701 91651617.0UtahJazz*82197553314...6634831240 1728 91611718.0SanAntonioSpurs*82198053468...5013869921487 91561819.0CharlotteHornets82198303297...5914051001 1550 90811920.0DenverNuggets*82197303439...6343631102 1644 90752021.0DallasMavericks82197803182...5333511167 1650 89272122.0IndianaPacers*82197053390...7134041122 1594 88572223.0PhoenixSuns82198803289...7354181279 1932 88152324.0OrlandoMagic*82197803316...5434451082 1526 88002425.0DetroitPistons*82198553185...5693311135 1811 87782526.0MiamiHeat82197303251...6274481208 1712 86682627.0ChicagoBulls82199053266...6033511159 1663 86052728.0NewYorkKnicks82197803134...5574221151 1713 85752829.0ClevelandCavaliers82197553189...5341951106 1642 85672930.0MemphisGrizzlies82198803113...6844481147 1801 849030NaNLeagueAverage82198153369...6264061155 1714 9119
[31rowsx25columns]
Solution 3:
As other answers mentioned this is basically because the content of page is being loaded by help of JavaScript and getting source code with help of urlopener or request will not load that dynamic part.
So here I have a way around of it, actually you can make use of selenium to let the dynamic content load and then get the source code from there and find for the table. Here is the code that actually give the result you expected. But you will need to setup selenium web driver
from lxml import htmlfrom bs4 import BeautifulSoup
fromtime import sleep
from selenium import webdriver
def parse(url):
response = webdriver.Firefox()
response.get(url)
sleep(3)
sourceCode=response.page_source
return sourceCode
year =2019
soup = BeautifulSoup(parse("https://www.basketball-reference.com/leagues/NBA_2019.html#all_team-stats-base".format(year)),'lxml')
x = soup.find("table", id="team-stats-per_game")
print(x)
Hope this helped you with your problem and feel free to ask any further doubts.
Happy Coding:)
Post a Comment for "Python Beautiful Soup Can't Find Specific Table"