Let us suppose that I have two lists with the SAME number of elements. The first contains just floating point numbers, the second one contains string labels. For example:
[1.98, 5.56, 4.34, 9.35, 6.23, 2.54, 8.31, 7.49]
[ "ABC", "LMN", "XYZ", "ABC", "ABC", "XYZ", "XYZ", "LMN"]
Let us suppose that I have also a list of ordered unique labels:
["ABC", "LMN", "XYZ"]
I want to write the most efficient Python code that:
- group values by labels
- apply a specific function to those values (eg. sum, mean, stdev)
- returns a list of calculated values in the same order of label list.
For example, if the function is sum, I expect a list of three values:
[sum(1.98, 9.35, 6.23), sum(5.56, 7.49), sum(4.34, 2.54, 8.31)]
If the function is mean, I expect a list of three values:
[mean(1.98, 9.35, 6.23), mean(5.56, 7.49), mean(4.34, 2.54, 8.31)]
Any hint?
CodePudding user response:
values = [1.98, 5.56, 4.34, 9.35, 6.23, 2.54, 8.31, 7.49]
keys = ["ABC", "LMN", "XYZ", "ABC", "ABC", "XYZ", "XYZ", "LMN"]
order = ["ABC", "LMN", "XYZ"]
out = {k:[] for k in order}
for key, value in zip(keys, values):
out[key].append(value)
CodePudding user response:
First reshape your data using a dictionary to group the values per key. Since python 3.7 dictionary keys are guaranteed to be in their order of insertion.
values = [1.98, 5.56, 4.34, 9.35, 6.23, 2.54, 8.31, 7.49]
keys = ["ABC", "LMN", "XYZ", "ABC", "ABC", "XYZ", "XYZ", "LMN"]
order = ["ABC", "LMN", "XYZ"]
out = {k:[] for k in order}
for key, value in zip(keys, values):
out[key].append(value)
output:
>>> out
{'ABC': [1.98, 9.35, 6.23], 'LMN': [5.56, 7.49], 'XYZ': [4.34, 2.54, 8.31]}
Then apply any transform you want
# sum
[round(sum(v),2) for v in out.values()]
#[17.56, 13.05, 15.19]
# mean
from statistics import mean
[round(mean(v),2) for v in out.values()]
# [5.85, 6.53, 5.06]
order of the initial keys
if you want, you can also keep the order of the keys in your list of keys, without needing an explicit list of ordered keys:
from collections import defaultdict
values = [1.98, 5.56, 4.34, 9.35, 6.23, 2.54, 8.31, 7.49]
keys = ["ABC", "LMN", "XYZ", "ABC", "ABC", "XYZ", "XYZ", "LMN"]
out = defaultdict(list)
for key, value in zip(keys, values):
out[key].append(value)
CodePudding user response:
data = [1.98, 5.56, 4.34, 9.35, 6.23, 2.54, 8.31, 7.49]
keys = [ "ABC", "LMN", "XYZ", "ABC", "ABC", "XYZ", "XYZ", "LMN"]
labels = ["ABC", "LMN", "XYZ"]
func = max
result = [func([data[ind] for ind in [i for i, x in enumerate(keys) if x == label]]) for label in labels]
print(result)
You can change your function as you want. My code concatenate loops in one line.