Home > Net >  awk Array Average
awk Array Average

Time:01-03

Using AWK (nothing else, i'm trying to learn AWK), I'd like to evaluate if a server sends less data to a specific IP than the others.

The logs contains :

  • The time of the log: from 01 in the
    morning to 24 in the evening (the same day)
  • The servers name
  • An IP that the server reached during that specific time slot.
  • The number of time the server reached that IP during the time slot.

Input:

$ cat Iplogs.txt  
Time,Source,destinationIP,Count  
11,server1,123.12.23.122,10  
11,server1,125.25.45.221,153  
11,server1,202.178.23.4,44  
11,server2,123.12.23.122,300  
11,server2,125.25.45.221,140  
11,server2,202.178.23.4,41  
12,server1,123.12.23.122,0  
12,server1,125.25.45.221,153  
12,server1,202.178.23.4,44  
12,server2,123.12.23.122,300  
12,server2,125.25.45.221,140  
12,server2,202.178.23.4,41

Expected Results:

server1,125.25.45.221,306,52.21%      #306/586*100
server2 125.25.45.221,47.78%      #280/586*100
server1 202.178.23.4,51.76%      #88/170*100
server1 123.12.23.122,1.63%      #10/610*100
server2 202.178.23.4,48.23%      #82/170*100
server2 123.12.23.122,98.36%      #600/610*100

=> 1.63% is equal to the trafic from server1 to 123.12.23.122 divided by the total of the trafic from server1 and 2 to 123.12.23.122, all along the day, multiplied by 100

What did I do so far : This commands gives the cumulative Count for each IP :

$ awk -F"," '{IP[$3];MAX[$3] =$4} END {for(i in IP) print i," ",IP[i]," ",MAX[i]}' Iplogs.txt
123.12.23.122      610
125.25.45.221      586
202.178.23.4      170

This commands gives the cumulative Count for each server, by IP reached :

$ awk -F"," '{SRVbyIP[$2" "$3];COUNT[$2" "$3] =$4} END {for(j in SRVbyIP) print j," ",SRVbyIP[j]," ",COUNT[j]}' Iplogs.txt | sort
server1 125.25.45.221      306
server2 125.25.45.221      280
server1 202.178.23.4      88
server1 123.12.23.122      10
server2 202.178.23.4      82
server2 123.12.23.122      600

... But I can't manage to find a way to divide COUNT[j]/MAX[i]

CodePudding user response:

Assumptions/understandings:

  • need to sum up Count's for each distinct IP
  • need to sum up Count's for each distinct host/IP pair
  • output should be a list of each distinct host/IP pair along with the sum for the host/IP pair, and the result of dividing the host/IP sum by the sum for the associated IP
  • ouput is to be sorted by host and then IP
  • we'll let awk perform normal rounding to 2 decimal places (eg, for a result of 1.6393 we should print 1.64); if OP needs to truncate (eg, 1.6393 becomes 1.63) then we'll need to make a small tweak to the code

One awk approach:

awk '
BEGIN   { FS=OFS="," }                                         # define input/output field delimiters as a comma
FNR==1  { next }                                               # skip header line

        { hosts[$2]                                            # maintain list of hosts
          ip_sums[$3] =$4                                      # sum up Counts by ip ($3)
          host_sums[$2,$3] =$4                                 # sum up Counts by host ($2) and ip ($3)
        }

END     { for (host in hosts)                                  # loop through list of hosts
              for (ip in ip_sums) {                            # loop through list of ips for a given host

                  if (! ((host,ip) in host_sums)) continue     # if no entry in host_sums[] for this host/ip pair then skip to next interation of loop

                  if (ip_sums[ip]==0)                          # if the sum for this ip is zero then address "divide by zero" error by ...
                     pct="0.00"                                # hardcoding the percent as 0.00
                  else {                                       # calculate percentage; uncommented line == "rounded"; commented line == "truncated"
                     pct=sprintf("%0.2f", host_sums[host,ip]*100/ip_sums[ip])
#                    pct=sprintf("%0.2f", int(host_sums[host,ip]*10000/ip_sums[ip]) /100)
                  }

                  print host,ip,host_sums[host,ip],pct "%"
              }
        }
' Iplogs.txt | sort -t',' -V -k1,1 -k2,2

This generates:

server1,123.12.23.122,10,1.64%
server1,125.25.45.221,306,52.22%
server1,202.178.23.4,88,51.76%
server2,123.12.23.122,600,98.36%
server2,125.25.45.221,280,47.78%
server2,202.178.23.4,82,48.24%

Sorted by IP and then host (sort -t',' -V -k2,2 -k1,1):

server1,123.12.23.122,10,1.64%
server2,123.12.23.122,600,98.36%
server1,125.25.45.221,306,52.22%
server2,125.25.45.221,280,47.78%
server1,202.178.23.4,88,51.76%
server2,202.178.23.4,82,48.24%
  • Related