How to group data by labels and return a list of calculated values?-CodePudding

Let us suppose that I have two lists with the SAME number of elements. The first contains just floating point numbers, the second one contains string labels. For example:

[1.98, 5.56, 4.34, 9.35, 6.23, 2.54, 8.31, 7.49]
[ "ABC", "LMN", "XYZ", "ABC", "ABC", "XYZ", "XYZ", "LMN"]

Let us suppose that I have also a list of ordered unique labels:

["ABC", "LMN", "XYZ"]

I want to write the most efficient Python code that:

group values by labels
apply a specific function to those values (eg. sum, mean, stdev)
returns a list of calculated values in the same order of label list.

For example, if the function is sum, I expect a list of three values:

[sum(1.98, 9.35, 6.23), sum(5.56, 7.49), sum(4.34, 2.54, 8.31)]

If the function is mean, I expect a list of three values:

[mean(1.98, 9.35, 6.23), mean(5.56, 7.49), mean(4.34, 2.54, 8.31)]

Any hint?

CodePudding user response：

values = [1.98, 5.56, 4.34, 9.35, 6.23, 2.54, 8.31, 7.49]
keys = ["ABC", "LMN", "XYZ", "ABC", "ABC", "XYZ", "XYZ", "LMN"]

order = ["ABC", "LMN", "XYZ"]

out = {k:[] for k in order}

for key, value in zip(keys, values):
    out[key].append(value)

CodePudding user response：

First reshape your data using a dictionary to group the values per key. Since python 3.7 dictionary keys are guaranteed to be in their order of insertion.


values = [1.98, 5.56, 4.34, 9.35, 6.23, 2.54, 8.31, 7.49]
keys = ["ABC", "LMN", "XYZ", "ABC", "ABC", "XYZ", "XYZ", "LMN"]

order = ["ABC", "LMN", "XYZ"]

out = {k:[] for k in order}

for key, value in zip(keys, values):
    out[key].append(value)

output:

>>> out
{'ABC': [1.98, 9.35, 6.23], 'LMN': [5.56, 7.49], 'XYZ': [4.34, 2.54, 8.31]}

Then apply any transform you want

# sum
[round(sum(v),2) for v in out.values()]
#[17.56, 13.05, 15.19]

# mean
from statistics import mean
[round(mean(v),2) for v in out.values()]
# [5.85, 6.53, 5.06]

order of the initial keys

if you want, you can also keep the order of the keys in your list of keys, without needing an explicit list of ordered keys:

from collections import defaultdict

values = [1.98, 5.56, 4.34, 9.35, 6.23, 2.54, 8.31, 7.49]
keys = ["ABC", "LMN", "XYZ", "ABC", "ABC", "XYZ", "XYZ", "LMN"]

out = defaultdict(list)
for key, value in zip(keys, values):
    out[key].append(value)

CodePudding user response：

data = [1.98, 5.56, 4.34, 9.35, 6.23, 2.54, 8.31, 7.49]
keys = [ "ABC", "LMN", "XYZ", "ABC", "ABC", "XYZ", "XYZ", "LMN"]
labels = ["ABC", "LMN", "XYZ"]
func = max
result = [func([data[ind] for ind in [i for i, x in enumerate(keys) if x == label]]) for label in labels]
print(result)

You can change your function as you want. My code concatenate loops in one line.