Home > Net >  Remove string based on start and end pattern and remove newline in the process
Remove string based on start and end pattern and remove newline in the process

Time:12-20

I have a file that contains output of some commands - unfortunately some of them are mangled by console error:

path="/a/b/c" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="132" host="cluster.local:1095" fstpath="/data/00019507/3dcd7e00" size="4574568" statsize="45745error: unable to retrieve file meta data from cluster.local:1095 [ status=down ]
nrep="00" fsid="37" host="cluster.local:1095" fstpath="/data/000021ca/0527e888" size="12550144" statsize="12550144" checksum="bb2a2ea700000000000000000000000000000000" diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="nerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
one"
path="/a/b/b98d6d3a-5c77-4223-9601-9294c73e00f9.bin" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="36" host="cluster.local:1095" fstpath="/data/00002196/05200f4d" size="12550144" statsize="12550144" checksum="045a6aa400000000000000000000000000000000" diskchecksum="045a6aa400000000000000000000000000000000" error_label="nonerror: unable to retrieve file meta data from cluster.local:1095 [ status=(down) ]
e"
path="/a/b/c/.mb6589013703229118680.txt" fxid="0524071a" size="0" nrep="2" checksumtype="adler" checksum="0000000100000000000000000000000000000000"
nrep="00" fsid="196" host="cluster.local:1095" fstpath="/dataerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
/000021b0/0524071a" size="0" statsize="0" checksum="0000000100000000000000000000000000000000" diskchecksum="0000000000000000000000000000000000000000" error_label="none"

Basically I'd like the string that starts with error: unable and ending with ] character to be removed completely so instead of :

diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="nerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
one"

I will have:

diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="none"

I have tried the following :

sed -e 's/error:.*]$//g'

However that gives me:

diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="n
one"

How do I get it to remove the newline as well when its removing the bad string?

Thanks

CodePudding user response:

With your shown samples, please try following awk using awk's RS as null here. Written and tested in GNU awk here.

awk -v RS="" '{gsub(/error: unable[^]] ]\n*/,"")} 1' Input_file

Explanation: Simple explanation would be, using global substitution to substitute error: unable till ] till newlines(0 or more occurrences) with NULL and perform printing then.

CodePudding user response:

Using gnu sed you can do this:

sed '/error: unable.*/ {s///;N;s/\n//;}' file

Or using awk:

awk 'sub(/error: unable.*/, "") {s = $0; getline; print s $0}' file

CodePudding user response:

Using sed

$ sed '/nerror:/{s/\(error_label=\)"nerror: unable[^]]*]/\1"none"/g;n;d}' input_file

CodePudding user response:

With GNU sed for -E (to enable EREs) and -z (to read the whole file at once and so allow us to match newlines in the regexp):

$ sed -Ez 's/error: unable[^]] ](\r?\n)?//g' file
path="/a/b/c" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="132" host="cluster.local:1095" fstpath="/data/00019507/3dcd7e00" size="4574568" statsize="45745nrep="00" fsid="37" host="cluster.local:1095" fstpath="/data/000021ca/0527e888" size="12550144" statsize="12550144" checksum="bb2a2ea700000000000000000000000000000000" diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="none"
path="/a/b/b98d6d3a-5c77-4223-9601-9294c73e00f9.bin" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="36" host="cluster.local:1095" fstpath="/data/00002196/05200f4d" size="12550144" statsize="12550144" checksum="045a6aa400000000000000000000000000000000" diskchecksum="045a6aa400000000000000000000000000000000" error_label="none"
path="/a/b/c/.mb6589013703229118680.txt" fxid="0524071a" size="0" nrep="2" checksumtype="adler" checksum="0000000100000000000000000000000000000000"
nrep="00" fsid="196" host="cluster.local:1095" fstpath="/data/000021b0/0524071a" size="0" statsize="0" checksum="0000000100000000000000000000000000000000" diskchecksum="0000000000000000000000000000000000000000" error_label="none"

The above accommodates you having newlines at the end of the text you want to match whether they're just \ns or \r\ns.

  • Related