Error Extracting Text From Website: Attributeerror 'nonetype' Object Has No Attribute 'get_text'
Solution 1:
You're very close--there's a couple of things I recommend. First, I'd recommend taking a closer look at the HTML--in this case the author names are actually in a ul
, where each li
contains a span
where itemprop
is 'name'
. However, not all articles have any author names at all. In this case, with your current code, the call to links.find('div', {'itemprop': 'name'})
returns None
. None
, of course, has no attribute get_text
. This means that line will throw an error, which in this case will just cause no value to be appended to the data2
'author'
list. I'd recommend storing the author(s) in a list like so:
authors = []
ul = links.find('ul', itemprop='creator')
for author in ul.find_all('span', itemprop='name'):
authors.append(author.text.strip())
data2['authors'].append(authors)
This handles the case where there are no authors as we would expect, by "authors" being an empty list.
As a side note, putting your code inside a
try:
...
except:
pass
construct is generally considered poor practice, for exactly the reason you're seeing now. Ignoring errors silently can give your program the appearance of running properly, while in fact any number of things could be going wrong. At the very least it's rarely a bad idea to print error info to stdout
. Even just doing something like this is better than nothing:
try:
...
except Exceptionas exc:
print(exc.__class__.__name__, exc)
For debugging, however, having the full traceback is often desirable as well. For this you can use the traceback
module.
import traceback
try:
...
except:
traceback.print_exc()
Solution 2:
Instead of using the strip method. Create a variable with all the items in and then use for
loop and utilise .text
author = links.findAll('span', {"itemprop": "name"})
for i in author:
data2["author"].append(i.text) #??????
prints
'author': ['Mark Zastrow', 'Barbara Mühlemann', 'Terry C. Jones', 'Peter de Barros Damgaard', 'Morten E. Allentoft', 'Irina Shevnina', 'Andrey Logvin', 'Emma Usmanova', ......
Post a Comment for "Error Extracting Text From Website: Attributeerror 'nonetype' Object Has No Attribute 'get_text'"