I have to clean several csv files before i put them in a database, some of the files have a unexpected linebreak in the middle of the line, as the line should always end with a number i managed to fix the files with this one liner:
perl -pe 's/[^0-9]\r?\n//g'
while it did work it also replaces the last char before the line break
foob
ar
turns into
fooar
Is there any one liner perl that i can call that would follow the same rule without replacing the last char before the linebreak
CodePudding user response:
Just capture the last char and put it back:
perl -pe 's/([^0-9])\r?\n/\1/g'
CodePudding user response:
One way is to use \K
lookbehind
perl -pe 's/[^0-9]\K\r?\n//g'
Now it drops all matches before it so only what follows is subject to the replacement side.
However, I'd rather recommend to process your CSV with a library, even as it's a little more code. There's already been one problem, that linefeed inside a field, what else may be there? A good library can handle a variety of irregularities.
A simple example with Text::CSV
use warnings;
use strict;
use feature 'say';
use Text::CSV;
my $file = shift or die "Usage: $0 file.csv\n";
my $csv = Text::CSV->new({ binary => 1, auto_diag => 1 });
open my $fh, '<', $file or die "Can't open $file: $!";
while (my $row = $csv->getline($fh)) {
s/\n //g for @$row;
$csv->say(\*STDOUT, $row);
}
Consider other constructor options (available via accessors as well) that are good for all kinds of unexpected problems. Like allow_whitespace
for example.
This can be done as a command-line program ("one-liner") as well, if there is a reason for that. The library's functional interface via csv is then very convenient
perl -MText::CSV=csv -we'
csv in => *ARGV, on_in => sub { s/\n //g for @{$_[1]} }' filename
With *ARGV
the input is taken either from a file named on command line or from STDIN
.