I have this data:
df = pd.DataFrame({
"dim1": [ "aaa", "aaa", "aaa", "aaa", "aaa", "aaa" ],
"dim2": [ "xxx", "xxx", "xxx", "yyy", "yyy", "yyy" ],
"iter": [ 0, 1, 2, 0, 1, 2 ],
"value1": [ 100, 101, 99, 500, 490, 510 ],
"value2": [ 10000, 10100, 9900, 50000, 49000, 51000 ],
})
I then groupby
dim1/dim2 and out of all iterations, I pick value1/value2 for the minimum value1:
df = df.groupby(["dim1", "dim2"], group_keys=False) \
.apply(lambda x: x.sort_values("value1").head(1)).drop(columns=["iter"])
which returns:
dim1 dim2 value1 value2
aaa xxx 99 9900
aaa yyy 490 49000
My question: how can I add a new column that contains the min value1 per dim1 group:
dim1 dim2 value1 value2 new_col
aaa xxx 99 9900 99
aaa yyy 490 49000 99
I tried something like this, which didn't work:
df["new_col"] = df.groupby(["dim1"], group_keys=False) \
.apply(lambda x: x.value1.head(1))
CodePudding user response:
IIUC, you can use .groupby
.transform
afterwards:
df["new_col"] = df.groupby("dim1")["value1"].transform("min")
print(df)
Prints:
dim1 dim2 value1 value2 new_col
2 aaa xxx 99 9900 99
4 aaa yyy 490 49000 99