Csv File With Quoted Comma Can't Be Correctly Split By Python
Solution 1:
CSV isn't a standardized format, but it's common to escape quotation marks by using two ""
if they appear inside the text (e.g. "text"":""a,b,c,d"
). Python's CSV reader is doing the right thing here, because it assumes this convention. I'm not quite sure what do you expect as output, but here is my try for a very simple CSV reader which might suit your format. Feel free to adapt it accordingly.
raw = [
'"1,2,3" , "4,5,6" , "456,789"',
'"text":"a,b,c,d", "gate":"456,789"',
'1,2, 3,'
]
for line in raw:
i, quoted, row=0, False, []
for j, c in enumerate(line):
if c ==','andnot quoted:
row.append(line[i:j].strip())
i = j +1
elif c =='"':
quoted =not quoted
row.append(line[i:j+1].strip())
for i inrange(len(row)):
if len(row[i]) >=2androw[i][0] =='"'androw[i][-1] =='"':
row[i] =row[i][1:-1] # remove quotation marks
print row
Output:
['1,2,3', '4,5,6', '456,789']['text":"a,b,c,d', 'gate":"456,789']['1', '2', '3', '']
Solution 2:
Leaving this here for posterity, because I struggled with this for a bit too.
The quotechar
argument to csv.reader()
helps resolve this; it'll let you ignore delims (i.e. commas, in this scenario) if they're inside quotes (assuming that all commas inside entries have been quoted). That is, it'll work for this:
Name, Message
Ford Prefect, Imagine this fork as the temporal universe.
Arthur Dent, "Hey, I was using that!"
...where the comma has been nested inside quotes, but the non-comma'd string has not.
Demo code ripped from the Py2 docs, and edited so that delimiter
is a comma (duh) and quotechar
is your double-quote "
:
import csv
withopen('eggs.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in spamreader:
print', '.join(row)
Post a Comment for "Csv File With Quoted Comma Can't Be Correctly Split By Python"