How to consider the first value of a pandas dataframe filtering by date-CodePudding

I have this pandas dataframe:

I need to consider only the first "up" of the day.

So My goal is to have this dataframe:

This is some working code:

import pandas as pd
from datetime import datetime
tbl = {"date" :["2022-02-27", "2022-02-27", "2022-02-27", "2022-02-28", "2022-02-28","2022-02- 
      28"],
               "direction" : ["up", "up", "NaN", "NaN", "up", "up"]}



df = pd.DataFrame(tbl)
df = df.replace('NaN', float('nan'))
pd.to_datetime(df['date'], format='%Y-%m-%d')
df.sort_values(by = "date", inplace=True)

Any ideas?

CodePudding user response：

Mask the duplicated values per date and direction:

df['direction'] = df['direction'].mask(df.duplicated(['date', 'direction']))

         date direction
0  2022-02-27        up
1  2022-02-27       NaN
2  2022-02-27       NaN
3  2022-02-28       NaN
4  2022-02-28        up
5  2022-02-28       NaN

CodePudding user response：

df['direction']=df.assign(cnt=df.groupby(['date','direction']).cumcount()).apply(lambda row: row['direction'] if (row['cnt'] ==0) else np.nan, axis=1)
df

    date    direction
0   2022-02-27  up
1   2022-02-27  NaN
2   2022-02-27  NaN
3   2022-02-28  NaN
4   2022-02-28  up
5   2022-02-28  NaN```