Home > Enterprise >  Regex to parse comma delimited line where each element may contain comma or double quote
Regex to parse comma delimited line where each element may contain comma or double quote

Time:08-27

I'm trying to find the regex expression I could to parse a line like:

"hello","here, I am","Building "A" and more","Building "B", Indiana"

I expect to find

  • "hello"
  • "here, I am"
  • "Building "A" and more"
  • "Building "B", Indiana"

regex (?:^|\,)(\"(?:[^\"]\,)*\"|[^\,]*) will correctly parse elements with double quotes (such as "Building "A" and more") and regex (?:^|\,)(\"(?:[^\"]\,?)*\"|[^\,]*) will parse elements with comma (such as "here, I am") but I have a hard time finding 1 expression that will correctly parse both elements and also the last one which includes a comma and double quotes. Note that an element may contain more than 1 comma and double quote.

I will use this regex in a C# .NET Core 6 application.

CodePudding user response:

What about this regex:

"'(?<v1>. ?)'(?=,')|'(?<v2>. )'"g

Note: I've used single quotes just to be supported inside the Regex101.

Explanation:

  1. '(?<v1>. ?)'(?=,') matches every quoted character following with a comma and a quote (first priority)
  2. '(?. )' if not followed by a comma, it should be a match too (second priority)

Regex101

  • Related