Using AWK (nothing else, i'm trying to learn AWK), I'd like to evaluate if a server sends less data to a specific IP than the others.
The logs contains :
- The time of the log: from 01 in the
morning to 24 in the evening (the same day) - The servers name
- An IP that the server reached during that specific time slot.
- The number of time the server reached that IP during the time slot.
Input:
$ cat Iplogs.txt
Time,Source,destinationIP,Count
11,server1,123.12.23.122,10
11,server1,125.25.45.221,153
11,server1,202.178.23.4,44
11,server2,123.12.23.122,300
11,server2,125.25.45.221,140
11,server2,202.178.23.4,41
12,server1,123.12.23.122,0
12,server1,125.25.45.221,153
12,server1,202.178.23.4,44
12,server2,123.12.23.122,300
12,server2,125.25.45.221,140
12,server2,202.178.23.4,41
Expected Results:
server1,125.25.45.221,306,52.21% #306/586*100
server2 125.25.45.221,47.78% #280/586*100
server1 202.178.23.4,51.76% #88/170*100
server1 123.12.23.122,1.63% #10/610*100
server2 202.178.23.4,48.23% #82/170*100
server2 123.12.23.122,98.36% #600/610*100
=> 1.63% is equal to the trafic from server1 to 123.12.23.122 divided by the total of the trafic from server1 and 2 to 123.12.23.122, all along the day, multiplied by 100
What did I do so far : This commands gives the cumulative Count for each IP :
$ awk -F"," '{IP[$3];MAX[$3] =$4} END {for(i in IP) print i," ",IP[i]," ",MAX[i]}' Iplogs.txt
123.12.23.122 610
125.25.45.221 586
202.178.23.4 170
This commands gives the cumulative Count for each server, by IP reached :
$ awk -F"," '{SRVbyIP[$2" "$3];COUNT[$2" "$3] =$4} END {for(j in SRVbyIP) print j," ",SRVbyIP[j]," ",COUNT[j]}' Iplogs.txt | sort
server1 125.25.45.221 306
server2 125.25.45.221 280
server1 202.178.23.4 88
server1 123.12.23.122 10
server2 202.178.23.4 82
server2 123.12.23.122 600
... But I can't manage to find a way to divide COUNT[j]/MAX[i]
CodePudding user response:
Assumptions/understandings:
- need to sum up
Count's
for each distinct IP - need to sum up
Count's
for each distinct host/IP pair - output should be a list of each distinct host/IP pair along with the sum for the host/IP pair, and the result of dividing the host/IP sum by the sum for the associated IP
- ouput is to be sorted by host and then IP
- we'll let
awk
perform normal rounding to 2 decimal places (eg, for a result of1.6393
we should print1.64
); if OP needs to truncate (eg,1.6393
becomes1.63
) then we'll need to make a small tweak to the code
One awk
approach:
awk '
BEGIN { FS=OFS="," } # define input/output field delimiters as a comma
FNR==1 { next } # skip header line
{ hosts[$2] # maintain list of hosts
ip_sums[$3] =$4 # sum up Counts by ip ($3)
host_sums[$2,$3] =$4 # sum up Counts by host ($2) and ip ($3)
}
END { for (host in hosts) # loop through list of hosts
for (ip in ip_sums) { # loop through list of ips for a given host
if (! ((host,ip) in host_sums)) continue # if no entry in host_sums[] for this host/ip pair then skip to next interation of loop
if (ip_sums[ip]==0) # if the sum for this ip is zero then address "divide by zero" error by ...
pct="0.00" # hardcoding the percent as 0.00
else { # calculate percentage; uncommented line == "rounded"; commented line == "truncated"
pct=sprintf("%0.2f", host_sums[host,ip]*100/ip_sums[ip])
# pct=sprintf("%0.2f", int(host_sums[host,ip]*10000/ip_sums[ip]) /100)
}
print host,ip,host_sums[host,ip],pct "%"
}
}
' Iplogs.txt | sort -t',' -V -k1,1 -k2,2
This generates:
server1,123.12.23.122,10,1.64%
server1,125.25.45.221,306,52.22%
server1,202.178.23.4,88,51.76%
server2,123.12.23.122,600,98.36%
server2,125.25.45.221,280,47.78%
server2,202.178.23.4,82,48.24%
Sorted by IP and then host (sort -t',' -V -k2,2 -k1,1
):
server1,123.12.23.122,10,1.64%
server2,123.12.23.122,600,98.36%
server1,125.25.45.221,306,52.22%
server2,125.25.45.221,280,47.78%
server1,202.178.23.4,88,51.76%
server2,202.178.23.4,82,48.24%