Home > Software engineering >  pandas groupby per-group value
pandas groupby per-group value

Time:04-23

I have this data:

df = pd.DataFrame({
    "dim1":   [ "aaa", "aaa", "aaa", "aaa", "aaa", "aaa" ],
    "dim2":   [ "xxx", "xxx", "xxx", "yyy", "yyy", "yyy" ],
    "iter":   [     0,     1,     2,     0,     1,     2 ],
    "value1": [   100,   101,    99,   500,   490,   510 ],
    "value2": [ 10000, 10100,  9900, 50000, 49000, 51000 ],
})

I then groupby dim1/dim2 and out of all iterations, I pick value1/value2 for the minimum value1:

df = df.groupby(["dim1", "dim2"], group_keys=False) \
    .apply(lambda x: x.sort_values("value1").head(1)).drop(columns=["iter"])

which returns:

dim1    dim2    value1  value2
 aaa    xxx         99    9900
 aaa    yyy        490   49000

My question: how can I add a new column that contains the min value1 per dim1 group:

dim1    dim2    value1  value2     new_col
 aaa    xxx         99    9900          99
 aaa    yyy        490   49000          99

I tried something like this, which didn't work:

df["new_col"] = df.groupby(["dim1"], group_keys=False) \
    .apply(lambda x: x.value1.head(1))

CodePudding user response:

IIUC, you can use .groupby .transform afterwards:

df["new_col"] = df.groupby("dim1")["value1"].transform("min")
print(df)

Prints:

  dim1 dim2  value1  value2  new_col
2  aaa  xxx      99    9900       99
4  aaa  yyy     490   49000       99
  • Related