sed to multiple replace based on the condition in a file-CodePudding

Experts i have a text file where i have some mathematic data and there i've hyphen - which i need to replace into 0 and MB at the end of numbers which also need be removed so, i can get only numbers.

Below is sample data in a file called file1:

Data:

$ cat file1

 3708MB 5073MB 5153MB  0MB
 -    63097MB 9939MB  53376MB
 -    817MB   681MB   271MB
 -    2655MB   692MB   2112MB

What i have tried:

$ /bin/sed   's/\r//g; s/-/0/g; s/MB//g' tt4
 3708 5073 5153  0
 0    63097 9939  53376
 0    817   681   271
 0    2655   692   2112

Or just to get columnize it better way via column command ...

$ /bin/sed   's/\r//g; s/-/0/g; s/MB//g' tt4| column -t
3708  5073   5153  0
0     63097  9939  53376
0     817    681   271
0     2655   692   2112

Is there a better to make sure strictly that only replace hyphen - which do not have anything in prefix and suffix and same for removing MB only if its and the end of the numbers.

CodePudding user response：

You have to think how uniquely you can capture the pattern(s) so to isolate it from any other appearance of the pattern(s).

Here, - seems to be surrounded by blank spaces. So you can use that to make it unique from, say, any other text with - ( e.g. text-text ).

sed 's/ - / 0 /g'

for the pattern MB, you can ensure that you are looking for the pattern which is followed by some numbers.


sed -r 's/([0-9] )MB/\1/g'

so together you can write:

sed -r 's/ - / 0 /g;s/([0-9] )MB/\1/g'

CodePudding user response：

Similar to the other answers but perhaps more portable:

sed '
    s/[[:space:]]\{1,\}/  /g
    s/^/ /
    s/$/ /
    s/ - / 0 /g
    s/ \([0-9]\{1,\}\)MB / \1 /g
' tt4 | column -t

I've added whitespace guards around MB numbers too. They require at least two space characters (one at each end), so I've replaced the \r test with a more general one to ensure the condition.

Adding space at beginning and end of line means \| is not required, use of which broke the code on FreeBSD.

Or there's awk (which is probably easier to read):

awk '{
    for (i=1; i<=NF; i  ) {
        if ($i=="-") $i=0
        if ($i~/^[0-9] MB$/) sub("MB","",$i)
    }
    print
}' tt4 | column -t

CodePudding user response：

Using sed

$ sed -Ez ':a;s/([0-9] )MB/\1/;s/(\n )-/\10/;ta' input_file
 3708 5073 5153  0
 0    63097 9939  53376
 0    817   681   271
 0    2655   692   2112

CodePudding user response：

Using GNU or BSD sed for -E, this may do what you want:

$ sed -E 's/(^| )-( |$)/\10\2/g; s/([0-9])MB( |$)/\1\2/g' file
 3708 5073 5153  0
 0    63097 9939  53376
 0    817   681   271
 0    2655   692   2112