Home > Back-end >  sed to multiple replace based on the condition in a file
sed to multiple replace based on the condition in a file

Time:07-23

Experts i have a text file where i have some mathematic data and there i've hyphen - which i need to replace into 0 and MB at the end of numbers which also need be removed so, i can get only numbers.

Below is sample data in a file called file1:

Data:

$ cat file1

 3708MB 5073MB 5153MB  0MB
 -    63097MB 9939MB  53376MB
 -    817MB   681MB   271MB
 -    2655MB   692MB   2112MB

What i have tried:

$ /bin/sed   's/\r//g; s/-/0/g; s/MB//g' tt4
 3708 5073 5153  0
 0    63097 9939  53376
 0    817   681   271
 0    2655   692   2112

Or just to get columnize it better way via column command ...

$ /bin/sed   's/\r//g; s/-/0/g; s/MB//g' tt4| column -t
3708  5073   5153  0
0     63097  9939  53376
0     817    681   271
0     2655   692   2112

Is there a better to make sure strictly that only replace hyphen - which do not have anything in prefix and suffix and same for removing MB only if its and the end of the numbers.

CodePudding user response:

You have to think how uniquely you can capture the pattern(s) so to isolate it from any other appearance of the pattern(s).

Here, - seems to be surrounded by blank spaces. So you can use that to make it unique from, say, any other text with - ( e.g. text-text ).

sed 's/ - / 0 /g'

for the pattern MB, you can ensure that you are looking for the pattern which is followed by some numbers.


sed -r 's/([0-9] )MB/\1/g' 

so together you can write:

sed -r 's/ - / 0 /g;s/([0-9] )MB/\1/g' 

CodePudding user response:

Similar to the other answers but perhaps more portable:

sed '
    s/[[:space:]]\{1,\}/  /g
    s/^/ /
    s/$/ /
    s/ - / 0 /g
    s/ \([0-9]\{1,\}\)MB / \1 /g
' tt4 | column -t

I've added whitespace guards around MB numbers too. They require at least two space characters (one at each end), so I've replaced the \r test with a more general one to ensure the condition.

Adding space at beginning and end of line means \| is not required, use of which broke the code on FreeBSD.


Or there's awk (which is probably easier to read):

awk '{
    for (i=1; i<=NF; i  ) {
        if ($i=="-") $i=0
        if ($i~/^[0-9] MB$/) sub("MB","",$i)
    }
    print
}' tt4 | column -t

CodePudding user response:

Using sed

$ sed -Ez ':a;s/([0-9] )MB/\1/;s/(\n )-/\10/;ta' input_file
 3708 5073 5153  0
 0    63097 9939  53376
 0    817   681   271
 0    2655   692   2112

CodePudding user response:

Using GNU or BSD sed for -E, this may do what you want:

$ sed -E 's/(^| )-( |$)/\10\2/g; s/([0-9])MB( |$)/\1\2/g' file
 3708 5073 5153  0
 0    63097 9939  53376
 0    817   681   271
 0    2655   692   2112
  • Related