Regex (or bash), get pipes between quotes (perl)-CodePudding

Update: Please keep in mind is that regex is my only option.

Update 2: Actually, I can use a bash based solution as well.

Trying to replace the pipes(can be more than one) that are between double quotes with commas in perl regex

Example

continuer|"First, Name"|123|12412|10/21/2020|"3|7"||Yes|No|No|

Expected output (3 and 7 are separated by a comma)

continuer|"First, Name"|123|12412|10/21/2020|"3,7"||Yes|No|No|

There may be more digits, it may not be just the two d\|d. It could be "3|7|2" and the correct output has to be "3,7,2" for that one. I've tried the following

cat <filename> | perl -pi -e 's/"\d \|[\|\d] /\d ,[\|\d] /g'

but it just puts the actual string of d etc...

I'd really appreciate your help. ty

CodePudding user response：

If it must be a regex here is a simpler one

perl -wpe's/("[^"] ")/ $1 =~ s{\|}{,}gr /eg' file

Not bullet-proof but it should work for the shown use case.^†

Explanation. With /e modifier the replacement side is evaluated as code. There, a regex runs on $1 under /r so that the original ($1) is unchanged; $N are read-only and so we can't change $1 and thus couldn't run a "normal" s/// on it. With this modifier the changed string is returned, or the original if there were no changes. Just as ordered.

Once it's tested well enough add -i to change the input file "in-place" if wanted.

I must add, I see no reason that at least this part of the job can't be done using a CSV parser...

^† Tested with strings like in the question, extended only as far as this

con|"F, N"|12|10/21|"3|7"||Yes|"2||4|12"|"a|b"|No|""|end|

CodePudding user response：

If you cannot install modules, Text::ParseWords is a core module you can try. It can split a string and handle quoted delimiters.

use Text::ParseWords;

my $q = q(continuer|"First, Name"|123|12412|10/21/2020|"3|7"||Yes|No|No|);
print join "|", map { tr/|/,/; $_ } quotewords('\|', 1, $q);

As a one-liner, it would be:

perl -MText::ParseWords -pe'$_ = join "|", map { tr/|/,/; $_ } quotewords('\|', 1, $_);' yourfile.txt

CodePudding user response：

You said Update 2: Actually, I can use a bash based solution as well. and while this script isn't bash you could call it from bash (or any other shell) which I assume is what you really mean by "bash based" so - this will work using any awk in any shell in every Unix box:

$ awk 'BEGIN{FS=OFS="\""} {for (i=2; i<=NF; i =2) gsub(/\|/,",",$i)} 1' file
continuer|"First, Name"|123|12412|10/21/2020|"3,7"||Yes|No|No|

Imagine yourself having to debug or enhance the clear, simple loop above above vs the regexp incantation you posted in your answer:

's/(?:(?<=")|\G(?!^))(\s*[^"|\s] (?:\s [^"|\s] )*)\s*\|\s*(?=[^"]*")/$1,/g'

Remember - Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems..

I'm sure you could do what I'm doing with awk above natively in perl instead if you're trying to modify a perl script to add this functionality.

CodePudding user response：

I'd use a CSV parser, not regular expressions:

#!/usr/bin/env perl
use warnings;
use strict;
use Text::CSV_XS;

my $csv = Text::CSV_XS->new({ binary => 1, sep_char => "|"});

while (my $row = $csv->getline(*ARGV)) {
    @$row = map { tr/|/,/r } @$row;
    $csv->say(*STDOUT, $row);
}

example:

$ perl demo.pl input.txt
continuer|"First, Name"|123|12412|10/21/2020|3,7||Yes|No|No|

More verbose, but also more robust and a lot easier to understand.

CodePudding user response：

I'd use Text::CSV_XS.

perl -MText::CSV_XS=csv -e'
   csv
      in       => \*ARGV,
      sep_char => "|",
      on_in    => sub { tr/|/,/ for @{ $_[1] } };
'

You can provide the file name as an argument or provide the data via STDIN.

CodePudding user response：

This is working right now

's/(?:(?<=")|\G(?!^))(\s*[^"|\s] (?:\s [^"|\s] )*)\s*\|\s*(?=[^"]*")/$1,/g'

Credit goes to my boss at work

Thanks everyone for looking.

I hope some of you realize that some projects require certain ways and complicating an already very complicated pre existing structure is not always an option at work. I knew there would be a one liner for this, do not hate because you did not like that.