I have data like, below and when reading as CSV, I don't want to consider comma when its within the quotes even if the quotes are not immediate to the separator (like record #2). 1 and 3 records are good if we use separator, but failing on 2nd record. I tried escapeCharacter but not working. Input:
col1, col2, col3
a, b, c
a, b1 "b2, b3" b4, c
"a1, a2", b, c
Expected output for 2nd record is:
a
b1 "b2, b3" b4
c
Actual output:
a
b1 "b2
b3" b4
CodePudding user response:
Updated
There might be a better solution. But top of my head, I can only think of this approach.
If you see pattern, the splitted sub_string will always be next to each other. So, after splitting, we can combine consecutive sub_strings if there exists "
.
sample_strings = [
'col1, col2, col3',
'a, b, c',
'a, b1 "b2, b3, test, test1, test2, test3" b4, c',
'"a1, a2", b, c',
]
for string in sample_strings:
splitted = string.split(', ')
result = []
to_escape = -1
for index, value in enumerate(splitted):
if index <= to_escape:
continue
if '"' in value:
value = value ', '
index = index 1
while '"' not in splitted[index]:
value = splitted[index] ', '
index = 1
value = splitted[index]
result.append(value)
to_escape = index
else:
result.append(value)
print(result)
Output:
['col1', 'col2', 'col3']
['a', 'b', 'c']
['a', 'b1 "b2, b3, test, test1, test2, test3" b4', 'c']
['"a1, a2"', 'b', 'c']
CodePudding user response:
Any chance you could change the delimiter when creating the CSV files? I usually use semicolon instead of comma to avoid issues like that.
You can also tell python what's the delimiter in csv_reader = csv.reader(csv_file, delimiter=';')