I have done a parameter study (image compression) that takes three parameters (x1, x2, x3) and produces a result y (compression rate) for 50 files. Now I try to find out which parameter combination gives me the minimum mean compression rate over all files. I could iterate over all parameter combinations with python for loops and store the best result (as shown in the minimal example below). However I think there might be a more performant and concise solution with the pandas API.
import pandas as pd
df = pd.DataFrame({
"result": [4, 3, 2, 1],
"parameter": [1, 0, 1, 0],
"file": ["A", "A", "B", "B"]
})
min_result = (df["result"][0], None) # Choosing the first value as starting point
for parameter in [0, 1]: # Iterating over [0, 1]
result = df[df["parameter"] == parameter]["result"].mean() # Mean value of all files
if result <= min_result[0]: # Choosing the smallest result
min_result = (result, parameter)
print(min_result) # >>> (2.0, 0)
CodePudding user response:
It looks like you want a simple GroupBy.mean
:
out = df.groupby('parameter')['result'].mean()
NB. if you have many columns for the parameters, use: groupby(['col1', 'col2'...])
output:
parameter
0 2.0
1 3.0
Name: result, dtype: float64
minimum:
idx = out.idxmin()
min_result = (out[idx], idx)
output: (2.0, 0)