How can I aggregate the same key values and take an average of that value?
[['Absard', 140000.0], ['Absard', 150000.0], ['Absard', 150000.0], ['Absard', 173333.3], ['Abuzar', 28333.3], ['Abuzar', 34000.0], ['Abuzar', 90500.0], ['Afsarieh', 37333.3], ['Afsarieh', 44333.3], ['Afsarieh', 51666.6], ['Afsarieh', 55000.0], ['Afsarieh', 80000.0], ['Afsarieh', 105000.0], ['Ahang', 26666.6], ['Ahang', 46666.6], ['Air force', 55000.0], ['Air force', 56333.3]]
I want to print out this list like this,
NAME AVG.
----------------------
Abazar 3033333.33
Ahang 2666666.67
Air force 2333333.33
Afsarieh 1916666.67
These 5 are expected to store in a dictionary (Average values are randomly written)
Wrote something like this but this aggregates not properly
def takeAvg(addressList):
resultDict = {}
tot = 0
startIndex = 0
inner_i = 1
for inner_i in range(len(addressList)):
if(addressList[startIndex][0] == addressList[inner_i][0]):
tot = float(addressList[inner_i][1])
else:
tot = float(addressList[startIndex][1])
resultDict.update({addressList[startIndex][0]: format(float(tot / (inner_i-startIndex)), ".2f")})
tot = 0
startIndex = inner_i
return resultDict
CodePudding user response:
You can iterate your data, sorting into a defaultdict
of lists, and then average out the results for each key:
from collections import defaultdict
data = [['Absard', 140000.0], ['Absard', 150000.0], ['Absard', 150000.0], ['Absard', 173333.3], ['Abuzar', 28333.3], ['Abuzar', 34000.0], ['Abuzar', 90500.0], ['Afsarieh', 37333.3], ['Afsarieh', 44333.3], ['Afsarieh', 51666.6], ['Afsarieh', 55000.0], ['Afsarieh', 80000.0], ['Afsarieh', 105000.0], ['Ahang', 26666.6], ['Ahang', 46666.6], ['Air force', 55000.0], ['Air force', 56333.3]]
acc = defaultdict(list)
for name, value in data:
acc[name].append(value)
result = { k : sum(v)/len(v) for k, v in acc.items() }
Output:
{
'Absard': 153333.325,
'Abuzar': 50944.43333333333,
'Afsarieh': 62222.200000000004,
'Ahang': 36666.6,
'Air force': 55666.65
}
For display purposes you can print the values with using an f-string to format to 2 decimal places. e.g.
print(*[f'{k:16}\t{v:.2f}\n' for k, v in result.items()], end='')
Output:
Absard 153333.33
Abuzar 50944.43
Afsarieh 62222.20
Ahang 36666.60
Air force 55666.65
CodePudding user response:
If you convert it to a dict with lists as values, and then iterate the dict to get the average, you can do it like this:
data = [['Absard', 140000.0], ['Absard', 150000.0], ['Absard', 150000.0], ['Absard', 173333.3], ['Abuzar', 28333.3], ['Abuzar', 34000.0], ['Abuzar', 90500.0], ['Afsarieh', 37333.3], ['Afsarieh', 44333.3], ['Afsarieh', 51666.6], ['Afsarieh', 55000.0], ['Afsarieh', 80000.0], ['Afsarieh', 105000.0], ['Ahang', 26666.6], ['Ahang', 46666.6], ['Air force', 55000.0], ['Air force', 56333.3]]
res = {_lst[0]: [] for _lst in data}
for _lst in data:
name, num = _lst
res[name].append(num)
res = {k: round(sum(v) / len(v), 2) for k, v in res.items()}
print(res)
{'Absard': 153333.33, 'Abuzar': 50944.43, 'Afsarieh': 62222.2, 'Ahang': 36666.6, 'Air force': 55666.65}
Not the most efficient solution for all the iteration, but I hope it's easy to follow.
CodePudding user response:
You can use itertools.groupby
l = [['Absard', 140000.0], ['Absard', 150000.0], ['Absard', 150000.0], ['Absard', 173333.3], ['Abuzar', 28333.3], ['Abuzar', 34000.0], ['Abuzar', 90500.0], ['Afsarieh', 37333.3], ['Afsarieh', 44333.3], ['Afsarieh', 51666.6], ['Afsarieh', 55000.0], ['Afsarieh', 80000.0], ['Afsarieh', 105000.0], ['Ahang', 26666.6], ['Ahang', 46666.6], ['Air force', 55000.0], ['Air force', 56333.3]]
for k, g in groupby(l, key=lambda x: x[0]):
values = [_[1] for _ in g]
print(k, sum(values) / len(values), sep='\t')
Output
Absard 153333.325
Abuzar 50944.43333333333
Afsarieh 62222.200000000004
Ahang 36666.6
Air force 55666.65
This assumes that all the "keys" occur together - if, for example, "Absard" was to occur again at the end of the list, you would get 2 means for "Absard". You can ensure that by sorting the list before passing it into itertools.groupby
-
l = sorted(l, key=lambda x: x[0])