Skip to content Skip to sidebar Skip to footer

Csv File With Quoted Comma Can't Be Correctly Split By Python

def csv_split() : raw = [ ''1,2,3' , '4,5,6' , '456,789'', ''text':'a,b,c,d', 'gate':'456,789'' ] cr = csv.reader( raw, skipinitialspace=

Solution 1:

CSV isn't a standardized format, but it's common to escape quotation marks by using two "" if they appear inside the text (e.g. "text"":""a,b,c,d"). Python's CSV reader is doing the right thing here, because it assumes this convention. I'm not quite sure what do you expect as output, but here is my try for a very simple CSV reader which might suit your format. Feel free to adapt it accordingly.

raw = [
    '"1,2,3" , "4,5,6" , "456,789"',
    '"text":"a,b,c,d", "gate":"456,789"',
    '1,2,  3,'
]

for line in raw:
    i, quoted, row=0, False, []
    for j, c in enumerate(line):
        if c ==','andnot quoted:
            row.append(line[i:j].strip())
            i = j +1
        elif c =='"':
            quoted =not quoted
    row.append(line[i:j+1].strip())
    for i inrange(len(row)):
        if len(row[i]) >=2androw[i][0] =='"'androw[i][-1] =='"':
            row[i] =row[i][1:-1] # remove quotation marks
    print row

Output:

['1,2,3', '4,5,6', '456,789']['text":"a,b,c,d', 'gate":"456,789']['1', '2', '3', '']

Solution 2:

Leaving this here for posterity, because I struggled with this for a bit too.

The quotechar argument to csv.reader() helps resolve this; it'll let you ignore delims (i.e. commas, in this scenario) if they're inside quotes (assuming that all commas inside entries have been quoted). That is, it'll work for this:

Name, Message
Ford Prefect, Imagine this fork as the temporal universe.
Arthur Dent, "Hey, I was using that!" 

...where the comma has been nested inside quotes, but the non-comma'd string has not.

Demo code ripped from the Py2 docs, and edited so that delimiter is a comma (duh) and quotechar is your double-quote ":

import csv
withopen('eggs.csv', 'rb') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
    for row in spamreader:
        print', '.join(row)

Post a Comment for "Csv File With Quoted Comma Can't Be Correctly Split By Python"