I am trying to fill the values of a column in grouped data with the maximum value of the grouped data.
The following is a sample of the data
df1 = [[52, '1', '0'], [52, '1', '1'],
[52, '1', '0'], [52, '2', '0'],
[53, '2', '0'], [52, '2', '0']]
df = pd.DataFrame(df1, columns =['Cow','Lact', 'fail'])
Producing the following dataframe
Cow Lact fail
0 52 1 0
1 52 1 1
2 52 1 0
3 52 2 0
4 53 2 0
5 52 2 0
In this example I would like to replace the 0 values with 1 (max value) for cow = 52 lact = 1
Cow Lact fail
0 52 1 1
1 52 1 1
2 52 1 1
3 52 2 0
4 53 2 0
5 52 2 0
I have unsuccessfully modified code that appeared in Pandas groupby: change values in one column based on values in another column
grouped = df.groupby(["Cow", "Lact"], as_index=False).max()['fail']
for i in grouped:
if i == 1:
df['fail'] = 1
Solutions and clarification re failure of my approach appreciated. Thanks
CodePudding user response:
You can use a group by in combination with a transform "max." I'm not sure if you would simply want to replace the 'fail' column or if you would want to make a new column but this should get you the expected results.
df['fail'] = df.groupby(['Cow', 'Lact'])['fail'].transform(max)
CodePudding user response:
You were almost there, directly use transform('max')
:
df['fail'] = df.groupby(["Cow", "Lact"])['fail'].transform('max')
output:
Cow Lact fail
0 52 1 1
1 52 1 1
2 52 1 1
3 52 2 0
4 53 2 0
5 52 2 0