how can i improve my sed command to extract data form ping log file?-CodePudding

Follow the details, the site is asking me to include some text because there is mostly code, so i type this sentence, but i think it is self explanatory

Sample log file :

jue 08 abr 2021 13:33:49 -03 : VA  John Doe  : PING google.com (17x.2xx.1x2.4x): 56 data bytes

--- google.com ping statistics ---
100 packets transmitted, 100 packets received, 0% packet loss
round-trip min/avg/max = 40.462/50.166/62.318 ms
jue 08 abr 2021 13:35:35 -03 : VA  John Doe  : PING google.com (17x.2xx.1x2.4x): 56 data bytes

--- google.com ping statistics ---
100 packets transmitted, 99 packets received, 1% packet loss
round-trip min/avg/max = 42.055/48.856/136.962 ms
jue 08 abr 2021 13:37:21 -03 : VA  John Doe  : PING google.com (17x.2xx.1x2.4x): 56 data bytes

--- google.com ping statistics ---
100 packets transmitted, 100 packets received, 0% packet loss
round-trip min/avg/max = 40.058/47.762/64.169 ms

my command so far :

cat sample.log | sed -r -e '/^... [0-9]  ... [0-9]{4} [0-9]{2}:[0-9]{2}/{s/(... [0-9]  ... [0-9]{4} [0-9]{2}:[0-9]{2}).*$/\1/g;n;d}' -e '/^--- google.*$/d' -e 's/100 packets transmitted.*([0-9] %) packet.*$/\1/' -e '/round-trip/d'

result obtained :

jue 08 abr 2021 13:33
0%
jue 08 abr 2021 13:35
1%
jue 08 abr 2021 13:37
0%

desired ideal result :

jue 08 abr 2021 13:33, 0%
jue 08 abr 2021 13:35, 1%
jue 08 abr 2021 13:37, 0%

CodePudding user response：

1st solution: This should be a task for awk. With your shown samples, please try following awk code.

awk -v OFS=", " '
match($0,/^[a-zA-Z]  [0-9]{2} [a-zA-Z]  [0-9]{4} ([0-9]{2}:){2}[0-9]{2}/){
  val=substr($0,RSTART,RLENGTH-3)
  next
}
/packets transmitted/{
  print val,$(NF-2)
  val=""
}
'  Input_file

Explanation: Simple explanation would be, using match function where mentioning regex to match ^[a-zA-Z] [0-9]{2} [a-zA-Z] [0-9]{4} ([0-9]{2}:){2}[0-9]{2}(explained regex in further), if a match of is found then creating val variable which has value of matched(caught value) by regex. Using next will skip all further statements from here. Then checking condition if line contains packets transmitted then print val along with 3rd last field of that line. Nullifying val variable then.

Explanation of regex:

^[a-zA-Z]                ##Matching small/capital letters 1 or more occurrences from starting.
 [0-9]{2}                ##Matching space followed by 2 occurrences of digits.
 [a-zA-Z]                ##Matching space followed by 2 occurrences of small/capital letters.
 [0-9]{4}                ##Matching space followed by followed by 4 digits.
 ([0-9]{2}:){2}[0-9]{2}  ##Matching space followed by digits 2 occurrences followed by colon and this whole group should occur 2 times followed by 2 occurrences of digits.

2nd solution: Using GNU awk here we can use almost same mentioned regex in RS variable and can get desired results as follows:

awk -v RS='[a-zA-Z]  [0-9]{2} [a-zA-Z]  [0-9]{4} [0-9]{2}:[0-9]{2}|[0-9]{1,3}%' -v OFS=", " '
RT{
  val=(val?val (  count%2==0?ORS:OFS):"") RT
}
END{
  print val
}
'  Input_file

CodePudding user response：

To get the desired format, you can pipe the output to:

sed 'N;s/\n/, /'

The final command becomes (note that you don't need to cat to sed as it accepts the filename as an argument):

sed -r -e '/^... [0-9]  ... [0-9]{4} [0-9]{2}:[0-9]{2}/{s/(... [0-9]  ... [0-9]{4} [0-9]{2}:[0-9]{2}).*$/\1/g;n;d}' -e '/^--- google.*$/d' -e 's/100 packets transmitted.*([0-9] %) packet.*$/\1/' -e '/round-trip/d'  sample.log | sed 'N;s/\n/, /'

Output:

jue 08 abr 2021 13:33, 0%
jue 08 abr 2021 13:35, 1%
jue 08 abr 2021 13:37, 0%