I have a lob describing the rows of a CSV, and of course each column is delimited by a semicolon. Some of that colum are strings, delimited by pipes, which may hold a semicolon, so I must replace that semicolon with colon but only inside a delimiter used for string colums, or columns order will be destroyed.
Example of a row:
1;4;|1.Simple response|;|once upon a time; I used to...|;|my favorite character is ; I really love it.|
Response example:
1;4;|1.Simple response|;|once upon a time, I used to...|;|my favorite character is , I really love it.|
This is the regex I wrote:
(\|)(.*?)(\|[\n\;])
What I need is to replace that .*? with [;] but if a try, nothing will be catched. I don't get how to capture with regex, inside an already captured group.
Any advice?
Thanks
CodePudding user response:
I have a lob describing the rows of a CSV, and of course each column is delimited by a semicolon. Some of that column are strings, delimited by pipes, which may hold a semicolon, so I must replace that semicolon with colon but only inside a delimiter used for string columns, or columns order will be destroyed.
1;4;|1.Simple response|;|once upon a time; I used to...|;|my favorite character is ; I really love it.|
Any advice?
The XY problem is asking about your attempted solution rather than your actual problem. The XY Problem.
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. Regular Expressions: Now You Have Two Problems.
You have a nicely formatted file and you want to mangle it with regular expressions. Why?
Your file is an example of Delimiter-separated values.
Your file is almost identical to Comma-separated values.
The Go csv package reads and writes CSV files; the package supports the format described in RFC 4180.
When we receive so-called CSV files they often don't conform to RFC 4180. We often run a cleaning step to enforce conformity.
For example, for your file we want to:
- replace
"
enclosed escape characters with two (""
) escape characters. - replace
;
field delimiter characters with,
- replace
|
string escape characters with"
- replace enclosed '||' escape characters with one (
|
) escape character.
The Go csv package Reader Comma option supports changing the field delimiter character to ;
. The csv package does not support changing the string escape character to |
.
Here is the input and output from a simple DSV to CSV Go program that cleans your file for input into the Go csv package:
$ go run dsvtocsv.go file.dsv file.csv
$ cat file.dsv
1;4;|1.Simple response|;|once upon a time; I used to...|;|my favorite character is ; I really love it.|;||||;|"|;the end
$ cat file.csv
1,4,"1.Simple response","once upon a time; I used to...","my favorite character is ; I really love it.","|","""",the end
$
CodePudding user response:
It appears that the third part of the pattern (second pipe |) won't match because it must always be followed by a newline (\n) or semicolon (;), which is not the case with your input.
Did you mean something like this:
\|;([\|\n;])
//would allow newline OR semicolon OR second pipe