I have a dataframe, df with three columns A, B, and C as given below. What I want is a dataframe, df2 with three columns X, Y, and Z; where column Z takes the minimum value of column C when value in column B is "YES" and this is done for each category of value in column A given in column X. Also how can I add a column in df as given df3 highlighting the rows corresponding to this minimum value by 1.
What I have tried?
df2 = df.loc[df.groupby("A")["C"].idxmin()]
This is only for two columns. How can I apply condition on column B?
df
A B C
1 YES 20
1 YES 13
1 NO 18
1 NO 4
1 NO 9
1 YES 42
2 YES 22
2 NO 2
2 YES 3
2 YES 7
2 NO 51
df2
X Y Z
1 YES 13
2 YES 3
df3
A B C D
1 YES 20 0
1 YES 13 1
1 NO 18 0
1 NO 4 0
1 NO 9 0
1 YES 42 0
2 YES 22 0
2 NO 2 0
2 YES 3 1
2 YES 7 0
2 NO 51 0
CodePudding user response:
Filter the YES's, group by column A taking the min, then change your column names:
df.query('B == "YES"').groupby('A', as_index=False).min().set_axis(['X', 'Y', 'Z'], axis=1)
CodePudding user response:
you can group by , take minimum and then rename the columns
df2=df[df['B']=='YES'].groupby('A')['C'].min().to_frame().reset_index()
df2['Z'] = 'YES'
df2.rename(columns={'A':'X', 'C':'Y'}, inplace=True)
df2
X Y Z
0 1 13 YES
1 2 3 YES