Determining best parameter combination with pandas-CodePudding

I have done a parameter study (image compression) that takes three parameters (x1, x2, x3) and produces a result y (compression rate) for 50 files. Now I try to find out which parameter combination gives me the minimum mean compression rate over all files. I could iterate over all parameter combinations with python for loops and store the best result (as shown in the minimal example below). However I think there might be a more performant and concise solution with the pandas API.

import pandas as pd


df = pd.DataFrame({
    "result": [4, 3, 2, 1],
    "parameter": [1, 0, 1, 0],
    "file": ["A", "A", "B", "B"]
})

min_result = (df["result"][0], None)  # Choosing the first value as starting point
for parameter in [0, 1]:  # Iterating over [0, 1]
    result = df[df["parameter"] == parameter]["result"].mean()  # Mean value of all files
    if result <= min_result[0]:  # Choosing the smallest result
        min_result = (result, parameter)

print(min_result)  # >>> (2.0, 0)

CodePudding user response：

It looks like you want a simple GroupBy.mean:

out = df.groupby('parameter')['result'].mean()

NB. if you have many columns for the parameters, use: groupby(['col1', 'col2'...])

output:

parameter
0    2.0
1    3.0
Name: result, dtype: float64

minimum:

idx = out.idxmin()
min_result = (out[idx], idx)

output: (2.0, 0)