The U Before Strings
Solution 1:
Use codecs.open()
or io.open()
to open a text file using an appropriate text encoding (i.e. encoding="..."
) instead of opening a bytefile with open()
.
Solution 2:
You see the representation of Unicode strings that are contained in the list. When you print a list, repr()
is called on each item in it:
>>> s = u'text…'>>> s
u'text\u2026'>>> print(s)
text…
>>> print([s]) # <-- a list with a single item (the string)
[u'text\u2026']
u''
is a syntax for Unicode literals that may be used to defined Unicode strings in Python source code. Note: if you use non-ascii characters inside a string literal then you should define the source code encoding at the top of the module e.g., # -*- coding: utf-8 -*-
.
To fix UnicodeEncodeError
when writing to a file, you need to convert Unicode strings to bytes. BeautifulSoup
provides several html-specific ways to do it.
Note: In general, the generic codecs.open()
or io.open()
suggested by @Ignacio Vazquez-Abrams won't be appropriate for an html text e.g., they don't modify <meta charset="...">
tag.
Solution 3:
Try converting them to strings:
forstring in soup.stripped_strings:
all_tds.append(str(string))
Here it is with a list comprehension:
all_tds = [str(string) forstringin soup.stripped_strings]
Post a Comment for "The U Before Strings"