Home > database >  How to replace all the non-first values of columns with NaN based on date?
How to replace all the non-first values of columns with NaN based on date?

Time:03-27

I have this dataframe:

df

I want to replace the non-First values of the columns with NaN, for each day.

This is how should the dataframe look like:

df desidered

This is what i tried:

import pandas as pd
from datetime import datetime
tbl = {"date" :["2022-02-27", "2022-02-27", "2022-02-27", "2022-02-27","2022-02-27", "2022-02- 
                28", "2022-02-28","2022-02-28", "2022-02-28"],
      "value1" : ["NaN", 0.1, 0.1, "NaN", "NaN", "NaN", "NaN", 0.3, "NaN"],
      "value2" : ["NaN", "NaN", 0.2, 0.3, "NaN", 0.3, 0.4, "NaN", "NaN"]}


df = pd.DataFrame(tbl)
df = df.replace('NaN', float('nan'))
pd.to_datetime(df['date'], format='%Y-%m-%d')

#i'm trying to use replace, but this does not consider the date

DataFrame.replace(to_replace=None, value=NoDefault.no_default, inplace=False, limit=None, 
regex=False, method=NoDefault.no_default)

CodePudding user response:

groupby rank

First create boolean mask with isna, then use groupby rank with method='first' to assign numerical ranks, finally mask the values in the original dataframe where rank is 1

df = df.set_index('date')
df[df.isna().groupby('date').rank(method='first').eq(1)]

Result

            value1  value2
date                      
2022-02-27     NaN     NaN
2022-02-27     0.1     NaN
2022-02-27     NaN     0.2
2022-02-27     NaN     NaN
2022-02-27     NaN     NaN
2022-02-28     NaN     0.3
2022-02-28     NaN     NaN
2022-02-28     0.3     NaN
2022-02-28     NaN     NaN
  • Related