I'm trying to parse file with list of movies where strings like:
id,title (year),genre1|genre2|genre3
Year field is optional, but there are movies with some parts of title in brackets
So I have such regex:
(?:^\s*(\d )\s*,.*?)(?:.*?\((\d{4})\))?(?:.*,\s*(.*)$)
How can I improve it to catch title which is between id and optional year (or genres if there is no year)?
Data example:
1,Ace Ventura: When Nature Calls(1995),Comedy
20,Money Train (1995),Action|Comedy|Crime|Drama|Thriller
21,Get Shorty (1995),Comedy|Crime|Thriller
22,Copycat ,Crime|Drama|Horror|Mystery|Thriller
23,Assassins (1995),Action|Crime|Thriller
24,"Powder (1995)",Drama|Sci-Fi
25,Leaving (5) Las Vegas ,Drama|Romance
CodePudding user response:
The year is always before a comma, so don't put .*
before the comma after the year.
^\s*(\d )\s*,(.*?)(?:\((\d{4})\))?\s*,\s*(.*)$