The U Before Strings
Solution 1:
Use codecs.open() or io.open() to open a text file using an appropriate text encoding (i.e. encoding="...") instead of opening a bytefile with open().
Solution 2:
You see the representation of Unicode strings that are contained in the list. When you print a list, repr() is called on each item in it:
>>> s = u'text…'>>> s
u'text\u2026'>>> print(s)
text…
>>> print([s]) # <-- a list with a single item (the string)
[u'text\u2026']
u'' is a syntax for Unicode literals that may be used to defined Unicode strings in Python source code. Note: if you use non-ascii characters inside a string literal then you should define the source code encoding at the top of the module e.g., # -*- coding: utf-8 -*-.
To fix UnicodeEncodeError when writing to a file, you need to convert Unicode strings to bytes. BeautifulSoup provides several html-specific ways to do it.
Note: In general, the generic codecs.open() or io.open()suggested by @Ignacio Vazquez-Abrams won't be appropriate for an html text e.g., they don't modify <meta charset="..."> tag.
Solution 3:
Try converting them to strings:
forstring in soup.stripped_strings:
all_tds.append(str(string))
Here it is with a list comprehension:
all_tds = [str(string) forstringin soup.stripped_strings]
Post a Comment for "The U Before Strings"