I have a below question about parsing the data with awk
and that works but looks not that efficient and can be improved on the point i made these.
Will appreciate any suggestion and help on this.
- Scenario one:
Raw Data:
# dmesg | awk '/blk_update_request/{ if ($7 =="sector") print $0}' | head -5
[14740442.054675] blk_update_request: I/O error, dev sde, sector 3618747392
[14740442.055693] blk_update_request: I/O error, dev sde, sector 3618746368
[14740442.056807] blk_update_request: I/O error, dev sde, sector 3618745344
[14740442.057927] blk_update_request: I/O error, dev sde, sector 3618744320
[14740442.059074] blk_update_request: I/O error, dev sde, sector 3618743296
Trial:
# dmesg | awk '/blk_update_request/{ if ($7 =="sector") print $6}'| cut -d, -f1|head -5
sde
sde
sde
sde
sde
Note
This works well, but for simply removing semicolon ,
I'm using cut
again which prolly adds another socket as |
.
Desired
Can this be adjusted with the awk itself, so as to do in a single call.
- Secario2
raw Data:
Mar 20 05:15:02 transpire kernel: [15432418.855144] EXT4-fs (dm-13): error count since last fsck: 100
Mar 20 06:16:12 transpire kernel: [15436088.797185] EXT4-fs (dm-12): error count since last fsck: 20
Mar 20 07:17:22 transpire kernel: [15439758.736285] EXT4-fs (dm-15): error count since last fsck: 40
Mar 20 07:17:22 transpire kernel: [15439758.736293] EXT4-fs (dm-14): error count since last fsck: 2
Mar 20 19:48:50 transpire kernel: [15484846.579068] EXT4-fs (dm-11): error count since last fsck: 55
Mar 20 19:48:50 transpire kernel: [15484846.580064] EXT4-fs (dm-10): error count since last fsck: 41
Mar 21 05:16:49 transpire kernel: [15518924.611572] EXT4-fs (dm-13): error count since last fsck: 100
Mar 21 06:17:59 transpire kernel: [15522594.553205] EXT4-fs (dm-12): error count since last fsck: 20
Mar 21 07:19:09 transpire kernel: [15526264.495077] EXT4-fs (dm-14): error count since last fsck: 2
Mar 21 07:19:09 transpire kernel: [15526264.495086] EXT4-fs (dm-15): error count since last fsck: 4
0
# awk '/dm/{print $8|"sort -u"}' /var/log/messages
(dm-10):
(dm-11):
(dm-12):
(dm-13):
(dm-14):
(dm-15):
error
enter code here
Trial:
# awk '/dm/{print $8|"sort -u"}' /var/log/messages|tr -d '():'|sed '$ d'
dm-10
dm-11
dm-12
dm-13
dm-14
dm-15
Again, as above i could not figured out to take this into single awk call hence i am using tr
to chop parens and sed
to remove error
as that comes as a last line.
Desired: Can this be done in a same awk call?
CodePudding user response:
With your shown samples please try following awk
solutions:
For your first output to get sde
please try following awk
code.
awk -F',|[[:space:]] ' '/blk_update_request/ && $(NF-1)=="sector" {print $(NF-3)}' Input_file
For your 2nd output to get dm
values please try following awk
code.
awk -F"[)(]" '!arr[$2] {print $2 | "sort -n"}' Input_file
CodePudding user response:
Where you use dmesg | awk
for the first example and /var/log/messages
as the input file for the second example, I will use the example data in the question and read from an example file
For the first part you might change the field separator to an optional comma and 1 or more spaces.
Then do the comparison with field nr 7 and print field nr 6.
awk -F',?[[:space:]] ' '/blk_update_request/ && $7 =="sector"{ print $6 }' file
If there can be more than 1 comma, you can also replace all comma's in field nr 6 and then print it.
awk '/blk_update_request/ && $7 =="sector"{ gsub(/, /, "", $6); print $6 }' file
Output
sde
sde
sde
sde
sde
For the second part you could use gsub to replace the characters with an empty string, and then print field nr 8.
awk '{ gsub(/[():] /, "", $8); print $8 }' file | sort -u
Output
dm-10
dm-11
dm-12
dm-13
dm-14
dm-15
CodePudding user response:
I would use GNU AWK
for 2nd task following way, let file.txt
content be
Mar 20 05:15:02 transpire kernel: [15432418.855144] EXT4-fs (dm-13): error count since last fsck: 100
Mar 20 06:16:12 transpire kernel: [15436088.797185] EXT4-fs (dm-12): error count since last fsck: 20
Mar 20 07:17:22 transpire kernel: [15439758.736285] EXT4-fs (dm-15): error count since last fsck: 40
Mar 20 07:17:22 transpire kernel: [15439758.736293] EXT4-fs (dm-14): error count since last fsck: 2
Mar 20 19:48:50 transpire kernel: [15484846.579068] EXT4-fs (dm-11): error count since last fsck: 55
Mar 20 19:48:50 transpire kernel: [15484846.580064] EXT4-fs (dm-10): error count since last fsck: 41
Mar 21 05:16:49 transpire kernel: [15518924.611572] EXT4-fs (dm-13): error count since last fsck: 100
Mar 21 06:17:59 transpire kernel: [15522594.553205] EXT4-fs (dm-12): error count since last fsck: 20
Mar 21 07:19:09 transpire kernel: [15526264.495077] EXT4-fs (dm-14): error count since last fsck: 2
Mar 21 07:19:09 transpire kernel: [15526264.495086] EXT4-fs (dm-15): error count since last fsck: 4
then
awk 'BEGIN{PROCINFO["sorted_in"]="@ind_str_asc";FPAT="dm-[[:digit:]] "}NF{arr[$1]}END{for(i in arr){print i}}' file.txt
output
dm-10
dm-11
dm-12
dm-13
dm-14
dm-15
Explanation: I inform GNU AWK
to use indices as string ascending Array traverse and that field is dm-
followed by 1 or more digits. NF
is built-in variable denoting number of fields, when used as condition it is true if at least 1 field is present in line. In such case I do refer to key being that field of array arr
. This does create such key in that array, note that value do not have to be given. After processing all lines I print
all keys using for
...in
in order dictated by PROCINFO["sorted_in"]
.
(tested in gawk 4.2.1)
CodePudding user response:
Scenario 1:
$ awk -F'[ ,] ' '/blk_update_request/ && ($7=="sector"){print $6}' file
sde
sde
sde
sde
sde
Scenario 2:
If you just want the unique values then:
$ awk -F'[()]' '!seen[$2] {print $2}' file
dm-13
dm-12
dm-15
dm-14
dm-11
dm-10
but if you want them sorted then the most efficient (and portable and robust) way is:
$ awk -F'[()]' '{print $2}' file | sort -t'-' -u -k1,1 -k2,2n
dm-10
dm-11
dm-12
dm-13
dm-14
dm-15
Note that the above is sorting numerically on the number part so if you have strings other than dm
and/or single or tripe digit values as well as double digit it'll all sort (presumably) correctly.
You COULD move the call to sort
inside of the awk script but then you'd still have a pipe and would be adding awk spawning a subshell so that'd be less efficient.