Skip to content Skip to sidebar Skip to footer

How To Open An Ascii-encoded File As Utf8?

My files are in US-ASCII and a command like a = file( 'main.html') and a.read() loads them as an ASCII text. How do I get it to load as UTF8? The problem I am tring to solve is: Un

Solution 1:

You are trying to opening files without specifying an encoding, which means that python uses the default value (ASCII).

You need to decode the byte-string explicitly, using the .decode() function:

template_str = template_str.decode('utf8')

Your val variable you tried to interpolate into your template is itself a unicode value, and python wants to automatically convert your byte-string template (read from the file) into a unicode value too, so that it can combine both, and it'll use the default encoding to do so.

Did I mention already you should read Joel Spolsky's article on Unicode and the Python Unicode HOWTO? They'll help you understand what happened here.

Solution 2:

A solution working in Python2:

import codecs
fo = codecs.open('filename.txt', 'r', 'ascii')
content = fo.read()  ## returns unicodeasserttype(content) == unicode
fo.close()

utf8_content = content.encode('utf-8')
asserttype(utf8_content) == str

Solution 3:

I suppose that you are sure that your files are encoded in ASCII. Are you? :) As ASCII is included in UTF-8, you can decode this data using UTF-8 without expecting problems. However, when you are sure that the data is just ASCII, you should decode the data using just ASCII and not UTF-8.

"How do I get it to load as UTF8?"

I believe you mean "How do I get it to load as unicode?". Just decode the data using the ASCII codec and, in Python 2.x, the resulting data will be of type unicode. In Python 3, the resulting data will be of type str.

You will have to read about this topic in order to learn how to perform this kind of decoding in Python. Once understood, it is very simple.

Post a Comment for "How To Open An Ascii-encoded File As Utf8?"