The problem is simple. I do not want:
2022-01-09 03:07:15 observation line [data log flushed] 2.42 0.33 MISSED SYNC STEP -3.25 0.67 28 27 12 0
The sequence [data log flushed]
should be skipped over entirely.
Imagine the weblog:
2022-01-09 10:01:48 signal strength low.
2022-01-09 10:03:35 observation line [data log flushed] 3.92 3.83 3.25 -1.75 0.67 34 28 20 0
2022-01-09 11:00:50 observation line [data log flushed] 4.42 5.33 3.75 -0.75 0.67 32 31 22 2
2022-01-09 12:04:43 observation line 4.42 5.83 4.75 0.75 2.17 37 27 23 2
2022-01-09 13:02:53 observation line 4.42 7.33 6.25 2.25 3.67 33 32 20 2
How should one proceed to filter out the junk; as in how would one collect only this output:
2022-01-09 10:01:48
2022-01-09 10:03:35 3.92 3.83 3.25 -1.75 0.67 34 28 20 0
2022-01-09 11:00:50 4.42 5.33 3.75 -0.75 0.67 32 31 22 2
2022-01-09 12:04:43 4.42 5.83 4.75 0.75 2.17 37 27 23 2
2022-01-09 13:02:53 4.42 7.33 6.25 2.25 3.67 33 32 20 2
My current solution involves first using:
lines=$(awk -F' ' '{print $1","$2","$5","$6","$7","$8","$9","$10","$11","$12","$13} END{print ""}' < $myfile )
But this sometimes replaces my first 3 values by [data log flushed]. How would you skip over this recurrent pattern?
CodePudding user response:
Using GNU sed
sed 's/[][:alpha:][][ .]\?//g' < myfile
2022-01-09 10:01:48
2022-01-09 10:03:35 3.92 3.83 3.25 -1.75 0.67 34 28 20 0
2022-01-09 11:00:50 4.42 5.33 3.75 -0.75 0.67 32 31 22 2
2022-01-09 12:04:43 4.42 5.83 4.75 0.75 2.17 37 27 23 2
2022-01-09 13:02:53 4.42 7.33 6.25 2.25 3.67 33 32 20 2
CodePudding user response:
Given:
$ cat file
2022-01-09 10:01:48 signal strength low.
2022-01-09 10:03:35 observation line [data log flushed] 3.92 3.83 3.25 -1.75 0.67 34 28 20 0
2022-01-09 11:00:50 observation line [data log flushed] 4.42 5.33 3.75 -0.75 0.67 32 31 22 2
2022-01-09 12:04:43 observation line 4.42 5.83 4.75 0.75 2.17 37 27 23 2
2022-01-09 13:02:53 observation line 4.42 7.33 6.25 2.25 3.67 33 32 20 2
With awk you can delete a specific sequence like so:
awk '{
sub(/signal strength low\./,"")
sub(/observation line /,"")
sub(/\[data log flushed\] /,"")
}$1=$1' file
Prints:
2022-01-09 10:01:48
2022-01-09 10:03:35 3.92 3.83 3.25 -1.75 0.67 34 28 20 0
2022-01-09 11:00:50 4.42 5.33 3.75 -0.75 0.67 32 31 22 2
2022-01-09 12:04:43 4.42 5.83 4.75 0.75 2.17 37 27 23 2
2022-01-09 13:02:53 4.42 7.33 6.25 2.25 3.67 33 32 20 2
Or if you want to delete anything that is not a number or date, you could do:
awk '{
gsub(/[^0-9\- \t:.]/,"")
gsub(/ \. /," ")
}$1=$1' file
Prints:
2022-01-09 10:01:48
2022-01-09 10:03:35 3.92 3.83 3.25 -1.75 0.67 34 28 20 0
2022-01-09 11:00:50 4.42 5.33 3.75 -0.75 0.67 32 31 22 2
2022-01-09 12:04:43 4.42 5.83 4.75 0.75 2.17 37 27 23 2
2022-01-09 13:02:53 4.42 7.33 6.25 2.25 3.67 33 32 20 2