Dealing With Windows Line-endings In Python
Solution 1:
Why are the DOS line-endings a problem? Most things can deal with them just fine, including XML parsers. If you really want to get rid of them, open the file in universal line-endings
mode:
open(filename, 'rU')
Python will convert all line-endings to UNIX line-endings for you. If you really can't use that (which I find a little surprising), there's no way to get Python to do the work for you. You will have to open the file regardless, though, so your objection to #2 seems a little odd.
Solution 2:
Are you opening the file in text mode or binary mode? I'm pretty sure I've counted on universal newlines on my Leopard install, but maybe I got an updated Python from somewhere too...
Anyway- I've seen this sort of thing biting many programmers in the bum, because they just reach for the 'b' key. Use a 't' if you're opening text files known to be created on your platform, 'U' instead of 't' if you need universal newlines.
withfile(filename, 'rt') as f:
content = f.read()
Edit: The comments note that 'rt' is the default. Fair point, but Python style tends to prefer explicit over implicit, so I'm going with that.
Solution 3:
Allegedly: """This guy has \r\n right in the middle of tag descriptors like so: <ParentRedirec tSequenceID>
""".
I see no \r\n
here. Perhaps you mean repr(xml) contains things like
"<ParentRedirec\r\ntSequenceID>"
If not, try to say precisely what you mean, with repr-fashion examples.
The following should work:
>>>import re>>>guff = """<atag>\r\n<bt\r\nag c="2">""">>>re.sub(r"(<[^>]*)\r\n([^>]*>)", r"\1\2", guff)
'<atag>\r\n<btag c="2">'
>>>
If there is more than one line break in a tag e.g. <foo\r\nbar\r\nzot>
this will fix only the first. Alternatives (1) loop until the guff stops shrinking (2) write a smarter regexp yourself :-)
Solution 4:
What are you trying to do with this file? Whitespace between tags is usually ignored in XML, so the only place where line endings matter tags' content.
Post a Comment for "Dealing With Windows Line-endings In Python"