Removing Non-ascii Characters In A Csv File
Solution 1:
If you really want to strip it, try:
import unicodedata
unicodedata.normalize('NFKD', title).encode('ascii','ignore')
* WARNING THIS WILL MODIFY YOUR DATA * It attempts to find a close match - i.e. ć -> c
Perhaps a better answer is to use unicodecsv instead.
----- EDIT ----- Okay, if you don't care that the data is represented at all, try the following:
# If row references a unicode string
b.create_from_csv_row(row.encode('ascii', 'ignore'))
If row is a collection, not a unicode string, you will need to iterate over the collection to the string level to re-serialize it.
Solution 2:
If you want to remove non-ascii characters from your data then iterate through your data and keep only the ascii.
for item in data:
iford(item) <= 128: # 1 - 128 is ascii
[append,write,print,whatever]
If you want to convert unicode characters to ascii, then the response above by DivinusVox is accurate.
Solution 3:
Pandas csv parser (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html) supports different encodings:
importpandasdata= pandas.read_csv(myfile, encoding='utf-8', quotechar='"', delimiter=',')
Post a Comment for "Removing Non-ascii Characters In A Csv File"