Home > Net >  Regex search character inside capturing group
Regex search character inside capturing group

Time:09-30

I have a lob describing the rows of a CSV, and of course each column is delimited by a semicolon. Some of that colum are strings, delimited by pipes, which may hold a semicolon, so I must replace that semicolon with colon but only inside a delimiter used for string colums, or columns order will be destroyed.

Example of a row:

1;4;|1.Simple response|;|once upon a time; I used to...|;|my favorite character is ; I really love it.|

Response example:

1;4;|1.Simple response|;|once upon a time, I used to...|;|my favorite character is , I really love it.|

This is the regex I wrote: (\|)(.*?)(\|[\n\;])

LINK To regex101

What I need is to replace that .*? with [;] but if a try, nothing will be catched. I don't get how to capture with regex, inside an already captured group.

Any advice?

Thanks

CodePudding user response:

I have a lob describing the rows of a CSV, and of course each column is delimited by a semicolon. Some of that column are strings, delimited by pipes, which may hold a semicolon, so I must replace that semicolon with colon but only inside a delimiter used for string columns, or columns order will be destroyed.

1;4;|1.Simple response|;|once upon a time; I used to...|;|my favorite character is ; I really love it.|

Any advice?


The XY problem is asking about your attempted solution rather than your actual problem. The XY Problem.

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. Regular Expressions: Now You Have Two Problems.


You have a nicely formatted file and you want to mangle it with regular expressions. Why?


Your file is an example of Delimiter-separated values.

Your file is almost identical to Comma-separated values.

The Go csv package reads and writes CSV files; the package supports the format described in RFC 4180.


When we receive so-called CSV files they often don't conform to RFC 4180. We often run a cleaning step to enforce conformity.

For example, for your file we want to:

  • replace " enclosed escape characters with two ("") escape characters.
  • replace ; field delimiter characters with ,
  • replace | string escape characters with "
  • replace enclosed '||' escape characters with one (|) escape character.

The Go csv package Reader Comma option supports changing the field delimiter character to ;. The csv package does not support changing the string escape character to |.


Here is the input and output from a simple DSV to CSV Go program that cleans your file for input into the Go csv package:

$ go run dsvtocsv.go file.dsv file.csv 
$ cat file.dsv
1;4;|1.Simple response|;|once upon a time; I used to...|;|my favorite character is ; I really love it.|;||||;|"|;the end
$ cat file.csv
1,4,"1.Simple response","once upon a time; I used to...","my favorite character is ; I really love it.","|","""",the end
$ 

CodePudding user response:

It appears that the third part of the pattern (second pipe |) won't match because it must always be followed by a newline (\n) or semicolon (;), which is not the case with your input.

Did you mean something like this:

\|;([\|\n;]) //would allow newline OR semicolon OR second pipe

https://regex101.com/r/S9MrWe/1

  • Related