Passing data with Awk as a single line and avoid opening multiple sockets with pipe-CodePudding

I have a below question about parsing the data with awk and that works but looks not that efficient and can be improved on the point i made these.

Will appreciate any suggestion and help on this.

Scenario one:

Raw Data:

# dmesg | awk  '/blk_update_request/{ if ($7 =="sector") print $0}' | head -5
[14740442.054675] blk_update_request: I/O error, dev sde, sector 3618747392
[14740442.055693] blk_update_request: I/O error, dev sde, sector 3618746368
[14740442.056807] blk_update_request: I/O error, dev sde, sector 3618745344
[14740442.057927] blk_update_request: I/O error, dev sde, sector 3618744320
[14740442.059074] blk_update_request: I/O error, dev sde, sector 3618743296

Trial:

# dmesg | awk  '/blk_update_request/{ if ($7 =="sector") print $6}'| cut -d, -f1|head -5
sde
sde
sde
sde
sde

Note

This works well, but for simply removing semicolon , I'm using cut again which prolly adds another socket as |.

Desired

Can this be adjusted with the awk itself, so as to do in a single call.

Secario2

raw Data:

Mar 20 05:15:02 transpire kernel: [15432418.855144] EXT4-fs (dm-13): error count since last fsck: 100
Mar 20 06:16:12 transpire kernel: [15436088.797185] EXT4-fs (dm-12): error count since last fsck: 20
Mar 20 07:17:22 transpire kernel: [15439758.736285] EXT4-fs (dm-15): error count since last fsck: 40
Mar 20 07:17:22 transpire kernel: [15439758.736293] EXT4-fs (dm-14): error count since last fsck: 2
Mar 20 19:48:50 transpire kernel: [15484846.579068] EXT4-fs (dm-11): error count since last fsck: 55
Mar 20 19:48:50 transpire kernel: [15484846.580064] EXT4-fs (dm-10): error count since last fsck: 41
Mar 21 05:16:49 transpire kernel: [15518924.611572] EXT4-fs (dm-13): error count since last fsck: 100
Mar 21 06:17:59 transpire kernel: [15522594.553205] EXT4-fs (dm-12): error count since last fsck: 20
Mar 21 07:19:09 transpire kernel: [15526264.495077] EXT4-fs (dm-14): error count since last fsck: 2
Mar 21 07:19:09 transpire kernel: [15526264.495086] EXT4-fs (dm-15): error count since last fsck: 4

# awk '/dm/{print $8|"sort -u"}' /var/log/messages
(dm-10):
(dm-11):
(dm-12):
(dm-13):
(dm-14):
(dm-15):
error
enter code here

Trial:

# awk '/dm/{print $8|"sort -u"}' /var/log/messages|tr -d '():'|sed '$ d'
dm-10
dm-11
dm-12
dm-13
dm-14
dm-15

Again, as above i could not figured out to take this into single awk call hence i am using tr to chop parens and sed to remove error as that comes as a last line.

Desired: Can this be done in a same awk call?

CodePudding user response：

With your shown samples please try following awk solutions:

For your first output to get sde please try following awk code.

awk -F',|[[:space:]] ' '/blk_update_request/ && $(NF-1)=="sector" {print $(NF-3)}' Input_file

For your 2nd output to get dm values please try following awk code.

awk -F"[)(]" '!arr[$2]  {print $2 | "sort -n"}' Input_file

CodePudding user response：

Where you use dmesg | awk for the first example and /var/log/messages as the input file for the second example, I will use the example data in the question and read from an example file

For the first part you might change the field separator to an optional comma and 1 or more spaces.

Then do the comparison with field nr 7 and print field nr 6.

awk -F',?[[:space:]] ' '/blk_update_request/ && $7 =="sector"{ print $6 }' file

If there can be more than 1 comma, you can also replace all comma's in field nr 6 and then print it.

awk '/blk_update_request/ && $7 =="sector"{ gsub(/, /, "", $6); print $6 }' file

Output

sde
sde
sde
sde
sde

For the second part you could use gsub to replace the characters with an empty string, and then print field nr 8.

awk '{ gsub(/[():] /, "", $8); print $8 }' file | sort -u

Output

dm-10
dm-11
dm-12
dm-13
dm-14
dm-15

CodePudding user response：

I would use GNU AWK for 2nd task following way, let file.txt content be

Mar 20 05:15:02 transpire kernel: [15432418.855144] EXT4-fs (dm-13): error count since last fsck: 100
Mar 20 06:16:12 transpire kernel: [15436088.797185] EXT4-fs (dm-12): error count since last fsck: 20
Mar 20 07:17:22 transpire kernel: [15439758.736285] EXT4-fs (dm-15): error count since last fsck: 40
Mar 20 07:17:22 transpire kernel: [15439758.736293] EXT4-fs (dm-14): error count since last fsck: 2
Mar 20 19:48:50 transpire kernel: [15484846.579068] EXT4-fs (dm-11): error count since last fsck: 55
Mar 20 19:48:50 transpire kernel: [15484846.580064] EXT4-fs (dm-10): error count since last fsck: 41
Mar 21 05:16:49 transpire kernel: [15518924.611572] EXT4-fs (dm-13): error count since last fsck: 100
Mar 21 06:17:59 transpire kernel: [15522594.553205] EXT4-fs (dm-12): error count since last fsck: 20
Mar 21 07:19:09 transpire kernel: [15526264.495077] EXT4-fs (dm-14): error count since last fsck: 2
Mar 21 07:19:09 transpire kernel: [15526264.495086] EXT4-fs (dm-15): error count since last fsck: 4

then

awk 'BEGIN{PROCINFO["sorted_in"]="@ind_str_asc";FPAT="dm-[[:digit:]] "}NF{arr[$1]}END{for(i in arr){print i}}' file.txt

output

dm-10
dm-11
dm-12
dm-13
dm-14
dm-15

Explanation: I inform GNU AWK to use indices as string ascending Array traverse and that field is dm- followed by 1 or more digits. NF is built-in variable denoting number of fields, when used as condition it is true if at least 1 field is present in line. In such case I do refer to key being that field of array arr. This does create such key in that array, note that value do not have to be given. After processing all lines I print all keys using for...in in order dictated by PROCINFO["sorted_in"].

(tested in gawk 4.2.1)

CodePudding user response：

Scenario 1:

$ awk -F'[ ,] ' '/blk_update_request/ && ($7=="sector"){print $6}' file
sde
sde
sde
sde
sde

Scenario 2:

If you just want the unique values then:

$ awk -F'[()]' '!seen[$2]  {print $2}' file
dm-13
dm-12
dm-15
dm-14
dm-11
dm-10

but if you want them sorted then the most efficient (and portable and robust) way is:

$ awk -F'[()]' '{print $2}' file | sort -t'-' -u -k1,1 -k2,2n
dm-10
dm-11
dm-12
dm-13
dm-14
dm-15

Note that the above is sorting numerically on the number part so if you have strings other than dm and/or single or tripe digit values as well as double digit it'll all sort (presumably) correctly.

You COULD move the call to sort inside of the awk script but then you'd still have a pipe and would be adding awk spawning a subshell so that'd be less efficient.