awk Array Average-CodePudding

Using AWK (nothing else, i'm trying to learn AWK), I'd like to evaluate if a server sends less data to a specific IP than the others.

The logs contains :

The time of the log: from 01 in the
morning to 24 in the evening (the same day)
The servers name
An IP that the server reached during that specific time slot.
The number of time the server reached that IP during the time slot.

Input:

$ cat Iplogs.txt  
Time,Source,destinationIP,Count  
11,server1,123.12.23.122,10  
11,server1,125.25.45.221,153  
11,server1,202.178.23.4,44  
11,server2,123.12.23.122,300  
11,server2,125.25.45.221,140  
11,server2,202.178.23.4,41  
12,server1,123.12.23.122,0  
12,server1,125.25.45.221,153  
12,server1,202.178.23.4,44  
12,server2,123.12.23.122,300  
12,server2,125.25.45.221,140  
12,server2,202.178.23.4,41

Expected Results:

server1,125.25.45.221,306,52.21%      #306/586*100
server2 125.25.45.221,47.78%      #280/586*100
server1 202.178.23.4,51.76%      #88/170*100
server1 123.12.23.122,1.63%      #10/610*100
server2 202.178.23.4,48.23%      #82/170*100
server2 123.12.23.122,98.36%      #600/610*100

=> 1.63% is equal to the trafic from server1 to 123.12.23.122 divided by the total of the trafic from server1 and 2 to 123.12.23.122, all along the day, multiplied by 100

What did I do so far : This commands gives the cumulative Count for each IP :

$ awk -F"," '{IP[$3];MAX[$3] =$4} END {for(i in IP) print i," ",IP[i]," ",MAX[i]}' Iplogs.txt
123.12.23.122      610
125.25.45.221      586
202.178.23.4      170

This commands gives the cumulative Count for each server, by IP reached :

$ awk -F"," '{SRVbyIP[$2" "$3];COUNT[$2" "$3] =$4} END {for(j in SRVbyIP) print j," ",SRVbyIP[j]," ",COUNT[j]}' Iplogs.txt | sort
server1 125.25.45.221      306
server2 125.25.45.221      280
server1 202.178.23.4      88
server1 123.12.23.122      10
server2 202.178.23.4      82
server2 123.12.23.122      600

... But I can't manage to find a way to divide COUNT[j]/MAX[i]

CodePudding user response：

Assumptions/understandings:

need to sum up Count's for each distinct IP
need to sum up Count's for each distinct host/IP pair
output should be a list of each distinct host/IP pair along with the sum for the host/IP pair, and the result of dividing the host/IP sum by the sum for the associated IP
ouput is to be sorted by host and then IP
we'll let awk perform normal rounding to 2 decimal places (eg, for a result of 1.6393 we should print 1.64); if OP needs to truncate (eg, 1.6393 becomes 1.63) then we'll need to make a small tweak to the code

One awk approach:

awk '
BEGIN   { FS=OFS="," }                                         # define input/output field delimiters as a comma
FNR==1  { next }                                               # skip header line

        { hosts[$2]                                            # maintain list of hosts
          ip_sums[$3] =$4                                      # sum up Counts by ip ($3)
          host_sums[$2,$3] =$4                                 # sum up Counts by host ($2) and ip ($3)
        }

END     { for (host in hosts)                                  # loop through list of hosts
              for (ip in ip_sums) {                            # loop through list of ips for a given host

                  if (! ((host,ip) in host_sums)) continue     # if no entry in host_sums[] for this host/ip pair then skip to next interation of loop

                  if (ip_sums[ip]==0)                          # if the sum for this ip is zero then address "divide by zero" error by ...
                     pct="0.00"                                # hardcoding the percent as 0.00
                  else {                                       # calculate percentage; uncommented line == "rounded"; commented line == "truncated"
                     pct=sprintf("%0.2f", host_sums[host,ip]*100/ip_sums[ip])
#                    pct=sprintf("%0.2f", int(host_sums[host,ip]*10000/ip_sums[ip]) /100)
                  }

                  print host,ip,host_sums[host,ip],pct "%"
              }
        }
' Iplogs.txt | sort -t',' -V -k1,1 -k2,2

This generates:

server1,123.12.23.122,10,1.64%
server1,125.25.45.221,306,52.22%
server1,202.178.23.4,88,51.76%
server2,123.12.23.122,600,98.36%
server2,125.25.45.221,280,47.78%
server2,202.178.23.4,82,48.24%

Sorted by IP and then host (sort -t',' -V -k2,2 -k1,1):

server1,123.12.23.122,10,1.64%
server2,123.12.23.122,600,98.36%
server1,125.25.45.221,306,52.22%
server2,125.25.45.221,280,47.78%
server1,202.178.23.4,88,51.76%
server2,202.178.23.4,82,48.24%