Update: Please keep in mind is that regex is my only option.
Update 2: Actually, I can use a bash based solution as well.
Trying to replace the pipes(can be more than one) that are between double quotes with commas in perl regex
Example
continuer|"First, Name"|123|12412|10/21/2020|"3|7"||Yes|No|No|
Expected output (3 and 7 are separated by a comma)
continuer|"First, Name"|123|12412|10/21/2020|"3,7"||Yes|No|No|
There may be more digits, it may not be just the two d\|d
. It could be "3|7|2"
and the correct output has to be "3,7,2"
for that one. I've tried the following
cat <filename> | perl -pi -e 's/"\d \|[\|\d] /\d ,[\|\d] /g'
but it just puts the actual string of d
etc...
I'd really appreciate your help. ty
CodePudding user response:
If it must be a regex here is a simpler one
perl -wpe's/("[^"] ")/ $1 =~ s{\|}{,}gr /eg' file
Not bullet-proof but it should work for the shown use case.†
Explanation. With /e
modifier the replacement side is evaluated as code. There, a regex runs on $1
under /r
so that the original ($1
) is unchanged; $N
are read-only and so we can't change $1
and thus couldn't run a "normal" s///
on it. With this modifier the changed string is returned, or the original if there were no changes. Just as ordered.
Once it's tested well enough add -i
to change the input file "in-place" if wanted.
I must add, I see no reason that at least this part of the job can't be done using a CSV parser...
† Tested with strings like in the question, extended only as far as this
con|"F, N"|12|10/21|"3|7"||Yes|"2||4|12"|"a|b"|No|""|end|
CodePudding user response:
If you cannot install modules, Text::ParseWords
is a core module you can try. It can split a string and handle quoted delimiters.
use Text::ParseWords;
my $q = q(continuer|"First, Name"|123|12412|10/21/2020|"3|7"||Yes|No|No|);
print join "|", map { tr/|/,/; $_ } quotewords('\|', 1, $q);
As a one-liner, it would be:
perl -MText::ParseWords -pe'$_ = join "|", map { tr/|/,/; $_ } quotewords('\|', 1, $_);' yourfile.txt
CodePudding user response:
You said Update 2: Actually, I can use a bash based solution as well.
and while this script isn't bash you could call it from bash (or any other shell) which I assume is what you really mean by "bash based" so - this will work using any awk in any shell in every Unix box:
$ awk 'BEGIN{FS=OFS="\""} {for (i=2; i<=NF; i =2) gsub(/\|/,",",$i)} 1' file
continuer|"First, Name"|123|12412|10/21/2020|"3,7"||Yes|No|No|
Imagine yourself having to debug or enhance the clear, simple loop above above vs the regexp incantation you posted in your answer:
's/(?:(?<=")|\G(?!^))(\s*[^"|\s] (?:\s [^"|\s] )*)\s*\|\s*(?=[^"]*")/$1,/g'
I'm sure you could do what I'm doing with awk above natively in perl instead if you're trying to modify a perl script to add this functionality.
CodePudding user response:
I'd use a CSV parser, not regular expressions:
#!/usr/bin/env perl
use warnings;
use strict;
use Text::CSV_XS;
my $csv = Text::CSV_XS->new({ binary => 1, sep_char => "|"});
while (my $row = $csv->getline(*ARGV)) {
@$row = map { tr/|/,/r } @$row;
$csv->say(*STDOUT, $row);
}
example:
$ perl demo.pl input.txt
continuer|"First, Name"|123|12412|10/21/2020|3,7||Yes|No|No|
More verbose, but also more robust and a lot easier to understand.
CodePudding user response:
I'd use Text::CSV_XS.
perl -MText::CSV_XS=csv -e'
csv
in => \*ARGV,
sep_char => "|",
on_in => sub { tr/|/,/ for @{ $_[1] } };
'
You can provide the file name as an argument or provide the data via STDIN.
CodePudding user response:
This is working right now
's/(?:(?<=")|\G(?!^))(\s*[^"|\s] (?:\s [^"|\s] )*)\s*\|\s*(?=[^"]*")/$1,/g'
Credit goes to my boss at work
Thanks everyone for looking.
I hope some of you realize that some projects require certain ways and complicating an already very complicated pre existing structure is not always an option at work. I knew there would be a one liner for this, do not hate because you did not like that.