Home > Back-end >  How to fix regex for parsing CSV data to work with empty substrings
How to fix regex for parsing CSV data to work with empty substrings

Time:07-28

How to change the following regex:

(?:(^|,)(?<quote>"|)(?<value>.*?)(\k<quote>)(?=(,|$)))

which works with: 1,1,-1 ... I get "1","1","-1"

and works with: "1","1","-1" ... I get "1","1","-1"


but it doesn't work as expected when one or more or the substrings are empty:

,1,-1 ...in such case I need to get: "", "1", "1"

,"1","-1" ...in such case I need to get: "", "1", "1"

,"1", ...in such case I need to get: "", "1", ""

,, ...in such case I need to get: "","",""

Is that possible?

CodePudding user response:

You can use

(?<=,|^)(?<quote>"?)(?<value>.*?)\k<quote>(?=,|$)

See the regex demo.

Details:

  • (?<=,|^) - start of string or a location right after a comma
  • (?<quote>"?) - an optional double quote captured into Group "quote"
  • (?<value>.*?) - Group "value": any zero or more chars other than line break chars as few as possible
  • \k<quote> - same char as in Group "quote"
  • (?=,|$) - a location right before a comma or end of string.
  • Related