Skip to content Skip to sidebar Skip to footer

Removing Non-ascii Characters In A Csv File

I am currently inserting data in my django models using csv file. Below is a simple save function that am using: def save(self): myfile = file.csv data = csv.reader(myfile, delimit

Solution 1:

If you really want to strip it, try:

import unicodedata

unicodedata.normalize('NFKD', title).encode('ascii','ignore')

* WARNING THIS WILL MODIFY YOUR DATA * It attempts to find a close match - i.e. ć -> c

Perhaps a better answer is to use unicodecsv instead.

----- EDIT ----- Okay, if you don't care that the data is represented at all, try the following:

# If row references a unicode string
b.create_from_csv_row(row.encode('ascii', 'ignore'))

If row is a collection, not a unicode string, you will need to iterate over the collection to the string level to re-serialize it.

Solution 2:

If you want to remove non-ascii characters from your data then iterate through your data and keep only the ascii.

for item in data:
     iford(item) <= 128: # 1 - 128 is ascii
          [append,write,print,whatever]

If you want to convert unicode characters to ascii, then the response above by DivinusVox is accurate.

Solution 3:

Pandas csv parser (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html) supports different encodings:

importpandasdata= pandas.read_csv(myfile, encoding='utf-8', quotechar='"', delimiter=',') 

Post a Comment for "Removing Non-ascii Characters In A Csv File"