I would like to perform a group by on a list and calculate the average.
Here is the list:
[['Profit ratio', [[2016, 5], [2017, 10], [2018, 5], [2016, 5], [2017, 20], [2018, 10]]]
After grouping and averaging I would like the following:
[['Profit ratio', [[2016, 5], [2017, 15], [2018, 7.5]]
I have tried doing this with a loop, that gathers the years and appends the numbers to the end and then calculates the average. Is there a better approach?
CodePudding user response:
Yeah this seems fairly straightforward. Assuming your data is:
data_with_headers = [['Profit ratio',
[[2016, 5],
[2017, 10],
[2018, 5],
[2016, 5],
[2017, 20],
[2018, 10]]]]
And that there's more values here than just "Profit ratio," you could do something like:
from collections import defaultdict
result = []
for header, values in data_with_headers:
raw_data = defaultdict(list)
for year, value in values:
raw_data[year].append(value)
result.append([header, [[year, sum(values)/len(values)] for year, values in raw_data.items()]])
assert result == [['Profit ratio', [[2016, 5.0], [2017, 15.0], [2018, 7.5]]]]