Skip to content Skip to sidebar Skip to footer

Python 3: Read Utf-8 File Containing German Umlaut

I searched and found many similar questions and articles but none would allow me to resolve the issue. I use Python 3.5.0 (v3.5.0:374f501f4567, Sep 13 2015, 02:27:37) [MSC v.1900 6

Solution 1:

The accepted answer is too complicated. You just need to specify encoding for open:

fh = open(file_name, encoding='utf8')

And everything works without problems.

The answer to your other question:

Doesn't Python 3 use UTF-8 as a default everywhere?

Is "not when communicating with the external world (filesystem in this case), because it would be inconsistent with your OS". Specification says user's preferred encoding is locale dependent. Do

>>>import locale>>>locale.getpreferredencoding()

to see what it is on your system - most likely "cp something" on Windows, depending on the exact default codepage set. But you can always override with explicit encoding argument to open.

There, I hope you learned something new. :-)

Solution 2:

You are encoding to bytes after using codecs.open , just printing the data should give you want as you can see when we decode back:

In [31]: s = b'I would like the following characters to display correctly after reading this file into Python:\r\n\r\n\xc3\x84\xc3\x96\xc3\x9c\xc3\xa4\xc3\xb6\xc3\xbc\xc3\x9f\r\n'In [32]: print(s)
b'I would like the following characters to display correctly after reading this file into Python:\r\n\r\n\xc3\x84\xc3\x96\xc3\x9c\xc3\xa4\xc3\xb6\xc3\xbc\xc3\x9f\r\n'In [33]: print(s.decode("utf-8"))
I would like the following characters to display correctly after reading this file into Python:

ÄÖÜäöüß

If you are not seeing the correct output then it is your shell encoding that is the problem. The windows console encoding is not utf-8 so where you are running the code from and the shell encoding matters.

Post a Comment for "Python 3: Read Utf-8 File Containing German Umlaut"