I have this dataframe:
I want to replace the non-First values of the columns with NaN, for each day.
This is how should the dataframe look like:
This is what i tried:
import pandas as pd
from datetime import datetime
tbl = {"date" :["2022-02-27", "2022-02-27", "2022-02-27", "2022-02-27","2022-02-27", "2022-02-
28", "2022-02-28","2022-02-28", "2022-02-28"],
"value1" : ["NaN", 0.1, 0.1, "NaN", "NaN", "NaN", "NaN", 0.3, "NaN"],
"value2" : ["NaN", "NaN", 0.2, 0.3, "NaN", 0.3, 0.4, "NaN", "NaN"]}
df = pd.DataFrame(tbl)
df = df.replace('NaN', float('nan'))
pd.to_datetime(df['date'], format='%Y-%m-%d')
#i'm trying to use replace, but this does not consider the date
DataFrame.replace(to_replace=None, value=NoDefault.no_default, inplace=False, limit=None,
regex=False, method=NoDefault.no_default)
CodePudding user response:
groupby
rank
First create boolean mask with isna
, then use groupby
rank
with method='first'
to assign numerical ranks, finally mask the values in the original dataframe where rank is 1
df = df.set_index('date')
df[df.isna().groupby('date').rank(method='first').eq(1)]
Result
value1 value2
date
2022-02-27 NaN NaN
2022-02-27 0.1 NaN
2022-02-27 NaN 0.2
2022-02-27 NaN NaN
2022-02-27 NaN NaN
2022-02-28 NaN 0.3
2022-02-28 NaN NaN
2022-02-28 0.3 NaN
2022-02-28 NaN NaN