Home > Net >  How to ignore commas in quotes when the entire field is not quoted while reading a csv in python?
How to ignore commas in quotes when the entire field is not quoted while reading a csv in python?

Time:06-14

I have data like, below and when reading as CSV, I don't want to consider comma when its within the quotes even if the quotes are not immediate to the separator (like record #2). 1 and 3 records are good if we use separator, but failing on 2nd record. I tried escapeCharacter but not working. Input:

col1, col2, col3
a, b, c
a, b1 "b2, b3" b4, c
"a1, a2", b, c

Expected output for 2nd record is:

  1. a
  2. b1 "b2, b3" b4
  3. c

Actual output:

  1. a
  2. b1 "b2
  3. b3" b4

CodePudding user response:

Updated

There might be a better solution. But top of my head, I can only think of this approach.

If you see pattern, the splitted sub_string will always be next to each other. So, after splitting, we can combine consecutive sub_strings if there exists ".

sample_strings = [
    'col1, col2, col3',
    'a, b, c',
    'a, b1 "b2, b3, test, test1, test2, test3" b4, c',
    '"a1, a2", b, c',
]


for string in sample_strings:
    splitted = string.split(', ')
    result = []
    to_escape = -1
    for index, value in enumerate(splitted):
        if index <= to_escape:
            continue

        if '"' in value:
            value = value   ', '
            index = index   1
            while '"' not in splitted[index]:
                value  = splitted[index]   ', '
                index  = 1
            value  = splitted[index]
            result.append(value)
            to_escape = index 

        else:
            result.append(value)
    
    print(result)

Output:

['col1', 'col2', 'col3']
['a', 'b', 'c']
['a', 'b1 "b2, b3, test, test1, test2, test3" b4', 'c']
['"a1, a2"', 'b', 'c']

CodePudding user response:

Any chance you could change the delimiter when creating the CSV files? I usually use semicolon instead of comma to avoid issues like that.

You can also tell python what's the delimiter in csv_reader = csv.reader(csv_file, delimiter=';')

  • Related