Home > Enterprise >  Problems with replacing NaT in pandas correctly
Problems with replacing NaT in pandas correctly

Time:04-07

I have a dataframe that contains some NaT values.

                       Date Value
6312957 2012-01-01 23:58:00   -49
6312958 2012-01-01 23:59:00   -49
6312959                 NaT   -48
6312960 2012-01-02 00:01:00   -47
6312961 2012-01-02 00:02:00   -46

I try to replace these NAT by adding a minute to the previous entry.

indices_of_NAT = np.flatnonzero(pd.isna(df.loc[:, "Date"]))
df.loc[indices_of_NAT, "Date"] = df.loc[indices_of_NAT - 1, "Date"]   pd.Timedelta(minutes=1)

This produces the correct timestamps and indices, which I checked manually. The only problem is that they don't replace the NaT values for whatever reason. I wonder if something goes wrong with the indexing in my last line of code. Is there something obvious I am missing?

CodePudding user response:

You can fillna with the shifted values 1 min:

df['Date'] = df['Date'].fillna(df['Date'].shift().add(pd.Timedelta('1min')))

Another method is to interpolate. For this you need to temporarily convert to a number. This way you can fill more than one gap and the increment will be calculated automatically, and there are many nice interpolation methods (see doc):

df['Date'] = (pd.to_datetime(pd.to_numeric(df['Date'])
                               .mask(df['Date'].isna())
                               .interpolate('linear'))
              )

Example:

                 Date  Value               shift         interpolate
0 2012-01-01 23:58:00    -49 2012-01-01 23:58:00 2012-01-01 23:58:00
1 2012-01-01 23:59:00    -49 2012-01-01 23:59:00 2012-01-01 23:59:00
2                 NaT    -48 2012-01-02 00:00:00 2012-01-02 00:00:00
3 2012-01-02 00:01:00    -47 2012-01-02 00:01:00 2012-01-02 00:01:00
4                 NaT    -48 2012-01-02 00:02:00 2012-01-02 00:01:20
5                 NaT    -48                 NaT 2012-01-02 00:01:40
6 2012-01-02 00:02:00    -46 2012-01-02 00:02:00 2012-01-02 00:02:00

CodePudding user response:

Use Series.fillna with shifted values with add 1 minute:

df['Date'] = df['Date'].fillna(df['Date'].shift()   pd.Timedelta(minutes=1))

Or with forward filling missing values with add 1 minute:

df['Date'] = df['Date'].fillna(df['Date'].ffill()   pd.Timedelta(minutes=1))

You can see difference with another data:

df['Date'] = pd.to_datetime(df['Date'])

df['Date1'] = df['Date'].fillna(df['Date'].shift()   pd.Timedelta(minutes=1))
df['Date2'] = df['Date'].fillna(df['Date'].ffill()   pd.Timedelta(minutes=1))
print (df)
                       Date  Value               Date1               Date2
6312957 2012-01-01 23:58:00    -49 2012-01-01 23:58:00 2012-01-01 23:58:00
6312958 2012-01-01 23:59:00    -49 2012-01-01 23:59:00 2012-01-01 23:59:00
6312959                 NaT    -48 2012-01-02 00:00:00 2012-01-02 00:00:00
6312960 2012-01-02 00:01:00    -47 2012-01-02 00:01:00 2012-01-02 00:01:00
6312961 2012-01-02 00:02:00    -46 2012-01-02 00:02:00 2012-01-02 00:02:00
6312962                 NaT    -47 2012-01-02 00:03:00 2012-01-02 00:03:00
6312963                 NaT    -47                 NaT 2012-01-02 00:03:00
6312967 2012-01-02 00:01:00    -47 2012-01-02 00:01:00 2012-01-02 00:01:00
  • Related