I have this pandas dataframe:
I need to consider only the first "up" of the day.
So My goal is to have this dataframe:
This is some working code:
import pandas as pd
from datetime import datetime
tbl = {"date" :["2022-02-27", "2022-02-27", "2022-02-27", "2022-02-28", "2022-02-28","2022-02-
28"],
"direction" : ["up", "up", "NaN", "NaN", "up", "up"]}
df = pd.DataFrame(tbl)
df = df.replace('NaN', float('nan'))
pd.to_datetime(df['date'], format='%Y-%m-%d')
df.sort_values(by = "date", inplace=True)
Any ideas?
CodePudding user response:
Mask
the duplicated
values per date
and direction
:
df['direction'] = df['direction'].mask(df.duplicated(['date', 'direction']))
date direction
0 2022-02-27 up
1 2022-02-27 NaN
2 2022-02-27 NaN
3 2022-02-28 NaN
4 2022-02-28 up
5 2022-02-28 NaN
CodePudding user response:
df['direction']=df.assign(cnt=df.groupby(['date','direction']).cumcount()).apply(lambda row: row['direction'] if (row['cnt'] ==0) else np.nan, axis=1)
df
date direction
0 2022-02-27 up
1 2022-02-27 NaN
2 2022-02-27 NaN
3 2022-02-28 NaN
4 2022-02-28 up
5 2022-02-28 NaN```