Read A File In Python Having Rogue Byte 0xc0 That Causes Utf-8 And Ascii To Error Out
Trying to read a tab-separated file into pandas dataframe: >>> df = pd.read_table(fn , na_filter=False, error_bad_lines=False) It errors out like so: b'Skipping line 58:
Solution 1:
Moving this answer here from another place where it got a hostile reception.
Found one standard that actually accepts (meaning, doesn't error out) byte 0xc0 :
encoding="ISO-8859-1"
Note: This entails making sure the rest of the file doesn't have unicode chars. This may be helpful for folks like me who didn't have any unicode chars in their file anyways and just wanted python to load the damn thing and both utf-8 and ascii encodings were erroring out.
More on ISO-8859-1 : What is the difference between UTF-8 and ISO-8859-1?
New command that works:
>>>df = pd.read_table(fn , na_filter=False, error_bad_lines=False, encoding='ISO-8859-1')
After reading it in, the dataframe is fine, the columns, data are all working like they did in OpenOffice Calc. I still have no idea where the offending 0xc0
byte went but it doesn't matter as I've got the data I needed.
Post a Comment for "Read A File In Python Having Rogue Byte 0xc0 That Causes Utf-8 And Ascii To Error Out"