Experts i have a text file where i have some mathematic data and there i've hyphen -
which i need to replace into 0
and MB
at the end of numbers which also need be removed so, i can get only numbers.
Below is sample data in a file called file1
:
Data:
$ cat file1
3708MB 5073MB 5153MB 0MB
- 63097MB 9939MB 53376MB
- 817MB 681MB 271MB
- 2655MB 692MB 2112MB
What i have tried:
$ /bin/sed 's/\r//g; s/-/0/g; s/MB//g' tt4
3708 5073 5153 0
0 63097 9939 53376
0 817 681 271
0 2655 692 2112
Or just to get columnize it better way via column
command ...
$ /bin/sed 's/\r//g; s/-/0/g; s/MB//g' tt4| column -t
3708 5073 5153 0
0 63097 9939 53376
0 817 681 271
0 2655 692 2112
Is there a better to make sure strictly that only replace hyphen -
which do not have anything in prefix and suffix and same for removing MB
only if its and the end of the numbers.
CodePudding user response:
You have to think how uniquely you can capture the pattern(s) so to isolate it from any other appearance of the pattern(s).
Here, -
seems to be surrounded by blank spaces. So you can use that to make it unique from, say, any other text with -
( e.g. text-text ).
sed 's/ - / 0 /g'
for the pattern MB, you can ensure that you are looking for the pattern which is followed by some numbers.
sed -r 's/([0-9] )MB/\1/g'
so together you can write:
sed -r 's/ - / 0 /g;s/([0-9] )MB/\1/g'
CodePudding user response:
Similar to the other answers but perhaps more portable:
sed '
s/[[:space:]]\{1,\}/ /g
s/^/ /
s/$/ /
s/ - / 0 /g
s/ \([0-9]\{1,\}\)MB / \1 /g
' tt4 | column -t
I've added whitespace guards around MB numbers too. They require at least two space characters (one at each end), so I've replaced the \r
test with a more general one to ensure the condition.
Adding space at beginning and end of line means \|
is not required, use of which broke the code on FreeBSD.
Or there's awk
(which is probably easier to read):
awk '{
for (i=1; i<=NF; i ) {
if ($i=="-") $i=0
if ($i~/^[0-9] MB$/) sub("MB","",$i)
}
print
}' tt4 | column -t
CodePudding user response:
Using sed
$ sed -Ez ':a;s/([0-9] )MB/\1/;s/(\n )-/\10/;ta' input_file
3708 5073 5153 0
0 63097 9939 53376
0 817 681 271
0 2655 692 2112
CodePudding user response:
Using GNU or BSD sed for -E
, this may do what you want:
$ sed -E 's/(^| )-( |$)/\10\2/g; s/([0-9])MB( |$)/\1\2/g' file
3708 5073 5153 0
0 63097 9939 53376
0 817 681 271
0 2655 692 2112