I am trying to maximum amount on salary column and minimum amount on expense column for continuous valid numbers (not null values )
I tried iterating rows but not working properly, looking for a pandas way solution.
Name date salary expense
Ram 10 1000 NaN
Ram 12 8000 NaN
Ram 13 NaN 500
Ram 14 NaN 800
Ram 15 3000 NaN
My expected output will be like this,
Name date salary expense
Ram 12 8000 NaN
Ram 13 NaN 500
Ram 15 3000 NaN
CodePudding user response:
Assuming the 'salary' and 'expense' are exclusive in a row, craft a custom group, sort the values to get the max salary and min expense first, and get the first row per group after sorting:
group = df['salary'].isna().diff().fillna(False).cumsum()
(df.sort_values(by=['salary', 'expense'], ascending=[False, True])
.groupby(group).first()
.reset_index(drop=True)
)
output:
Name date salary expense
0 Ram 12 8000.0 NaN
1 Ram 13 NaN 500.0
2 Ram 15 3000.0 NaN
CodePudding user response:
Solution for processing rows per maximum and minimal value per column separately with join in last step:
df['g1'] = df['salary'].isna().cumsum()
df['g2'] = df['expense'].isna().cumsum()
df1=df.loc[df.groupby(['Name','g1'])['salary'].idxmax().dropna(),['Name','date','salary']]
df2=df.loc[df.groupby(['Name','g2'])['expense'].idxmin().dropna(),['Name','date','expense']]
df = pd.concat([df1, df2]).sort_index().reset_index(drop=True)
print (df)
Name date salary expense
0 Ram 12 8000.0 NaN
1 Ram 13 NaN 500.0
2 Ram 15 3000.0 NaN