Pandas column perform max/min operation on continous rows-CodePudding

I am trying to maximum amount on salary column and minimum amount on expense column for continuous valid numbers (not null values )

I tried iterating rows but not working properly, looking for a pandas way solution.

Name   date    salary  expense
Ram    10      1000    NaN
Ram    12      8000    NaN
Ram    13      NaN     500
Ram    14      NaN     800
Ram    15      3000    NaN

My expected output will be like this,

Name   date    salary  expense
Ram    12      8000    NaN
Ram    13      NaN     500
Ram    15      3000    NaN

CodePudding user response：

Assuming the 'salary' and 'expense' are exclusive in a row, craft a custom group, sort the values to get the max salary and min expense first, and get the first row per group after sorting:

group = df['salary'].isna().diff().fillna(False).cumsum()
(df.sort_values(by=['salary', 'expense'], ascending=[False, True])
   .groupby(group).first()
   .reset_index(drop=True)
)

output:

  Name  date  salary  expense
0  Ram    12  8000.0      NaN
1  Ram    13     NaN    500.0
2  Ram    15  3000.0      NaN

CodePudding user response：

Solution for processing rows per maximum and minimal value per column separately with join in last step:

df['g1'] = df['salary'].isna().cumsum()
df['g2'] = df['expense'].isna().cumsum()

df1=df.loc[df.groupby(['Name','g1'])['salary'].idxmax().dropna(),['Name','date','salary']]
df2=df.loc[df.groupby(['Name','g2'])['expense'].idxmin().dropna(),['Name','date','expense']]

df = pd.concat([df1, df2]).sort_index().reset_index(drop=True)
print (df)
  Name  date  salary  expense
0  Ram    12  8000.0      NaN
1  Ram    13     NaN    500.0
2  Ram    15  3000.0      NaN