Python: Convert Rtf File To Unicode?
I'm trying to convert lines in an RTF file to a series of unicode strings, and then do a regex match on the lines. (I need them to be unicode so that I can output them to another f
Solution 1:
You did not even decode the RTF file. RTFs are not just simple text files. A file containing "äöü", for example, contains this:
{\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fswiss\fcharset0 Arial;}}
{*\generator Msftedit 5.41.15.1507;}\viewkind4\uc1\pard\f0\fs20\'e4\'f6\'fc\par
}
when opened in a text editor. So the characters "äöü" are encoded as windows-1252 as declared at the beginning of the file (äöü = 0xE4 0xF6 0xFC).
For reading RTF you'll first need something that converts RTF to text (already asked here).
Post a Comment for "Python: Convert Rtf File To Unicode?"