Let's say I have a list:
[('66.162.222.50', 4), ('88.112.19.251', 4), ('207.241.237.226', 3), ('207.241.237.224', 2), ('207.241.237.103', 2), ('74.125.19.82', 1)]
That has the IP address and the count of duplicate IP addresses found in a log: ('IP address', count)
What would be the easiest way for me to create a new list that would consist of the same IP addresses but would change the count to a percentage of total amount? Meaning that in this example we have a total of 4 4 3 2 2 2 which equals to 16 IP addresses, so the new list should look like this:
[('66.162.222.50', 0.25), ('88.112.19.251', 0.25), ('207.241.237.226', 0.1875), ('207.241.237.224', 0.125), ('207.241.237.103', 0.125), ('74.125.19.82', 0.0625)]
Thank you in advance
CodePudding user response:
I would approach this problem like so: first sum all counts, to get the total number of IPs, and then iterate through every tuple and add a new one with the percentage (fraction) instead of count.
lst = [('66.162.222.50', 4), ('88.112.19.251', 4), ('207.241.237.226', 3), ('207.241.237.224', 2), ('207.241.237.103', 2), ('74.125.19.82', 1)]
total = sum(i[1] for i in lst)
new_lst = [(i[0], i[1]/total) for i in lst]
print(new_lst) # output: [('66.162.222.50', 0.25), ('88.112.19.251', 0.25), ('207.241.237.226', 0.1875), ('207.241.237.224', 0.125), ('207.241.237.103', 0.125), ('74.125.19.82', 0.0625)]
CodePudding user response:
You could use some list comprehension here to sum the total and then calculate the percentage like so:
#dummy data
x = [('1',2),('2',3),('3',4)]
#calculate the total
total = sum([a[1] for a in x])
#calculate percentages
y = [(a[0], a[1]/total) for a in x]
where x is your input and y would be your output