Home > Software design >  Split based on commas but ignore commas within double-quotes
Split based on commas but ignore commas within double-quotes

Time:08-03

I try to split strings based on commas with avoiding the ones within the double quotes.Then I need to add those split strings to the list.

line = "DATA", "LT", "0.40", "1.25", "Sentence, which contain, 
commas", "401", "", "MN", "", "", "", "", ""

when I try to do it with

lineItems = line.split(",")

It splits based on all commas.

Conversely, when I use regex to split, I get all elements as one element on the list. (can not split them).

Is there any chance to get:

newlist  = ['DATA', 'LT', '0.40', '1.25', 'Sentence, which contain, 
    commas', '401', '', 'MN', '', '', '', '', '']

Thanks!

P.S I will have many similar rows so I want to get a similar result from all via iteration.

CodePudding user response:

You could use the shlex in-built module, like so

import shlex
line = '"DATA", "LT", "0.40", "1.25", "Sentence, which contain, commas", "401", "", "MN", "", "", "", "", ""'

newlist = [x[:-1] for x in shlex.split(line)]

CodePudding user response:

You mentioned you tried to split a 'string' variable. Therefor I assume you forgot to add the appropriate quotes. Is the following helpfull, assuming balanced double quotes?

import regex as re

line = """ "DATA", "LT", "0.40", "1.25", "Sentence, which contain, 
commas", "401", "", "MN", "", "", "", "", "" """

l = re.findall(r'"([^"]*)"', line)

print(l)

Prints:

['DATA', 'LT', '0.40', '1.25', 'Sentence, which contain, \ncommas', '401', '', 'MN', '', '', '', '', '']
  • Related