I have the lines from a CSV file:
315,"Misérables, Les (1995)",Drama|War
315,Big Bully (1996),Comedy|Drama
I want to split the line and make a list of 3 elements and I need a general REGEX expression that splits where it encounters ',' but since the title may have a comma (As shown in the first line), I need to skip the parsing of the title. A title that has commas has also quotation marks but I need the expression to work for both cases. Is it possible doing it with REGEX?
I'm trying to learn REGEX by myself and I'm having difficulties understanding some cases. I could really appreciate your help!
CodePudding user response:
If you're trying to parse a .csv
file, don't do it by hand, Python already has loads of libraries that will do it for you.
Otherwise if your string has quotation marks when there is a comma inside the title, and doesn't have them when there is not, you can do it like this:
>>> x = '315,"Misérables, Les (1995)",Drama|War'
>>> y = '315,Big Bully (1996),Comedy|Drama'
>>> x
'315,"Misérables, Les (1995)",Drama|War'
>>> y
'315,Big Bully (1996),Comedy|Drama'
>>> x.split('"') if len(x.split('"')) == 3 else x.split(',')
['315,', 'Misérables, Les (1995)', ',Drama|War']
>>> y.split('"') if len(y.split('"')) == 3 else y.split(',')
['315', 'Big Bully (1996)', 'Comedy|Drama']
This leaves the comma inside the first and last part (if it's split by a quotation mark), so you will have to remove them afterwards manually.
CodePudding user response:
Actually, you do not need to use REGEX for this problem. QUOTING will solve this.
For example:
filereader = csv.reader(csv_input_file, delimiter=',', quotechar='"')
give it a try to solve your problem