Skip to content Skip to sidebar Skip to footer

Python: Convert Rtf File To Unicode?

I'm trying to convert lines in an RTF file to a series of unicode strings, and then do a regex match on the lines. (I need them to be unicode so that I can output them to another f

Solution 1:

You did not even decode the RTF file. RTFs are not just simple text files. A file containing "äöü", for example, contains this:

{\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fswiss\fcharset0 Arial;}}

{*\generator Msftedit 5.41.15.1507;}\viewkind4\uc1\pard\f0\fs20\'e4\'f6\'fc\par

}

when opened in a text editor. So the characters "äöü" are encoded as windows-1252 as declared at the beginning of the file (äöü = 0xE4 0xF6 0xFC).

For reading RTF you'll first need something that converts RTF to text (already asked here).

Post a Comment for "Python: Convert Rtf File To Unicode?"