Home > Net >  Why does date format change with np.where and how do you stop it from happening
Why does date format change with np.where and how do you stop it from happening

Time:03-02

I have the following subset of data from a dataframe.

{'NID': {131598: '215026851',
  131599: '215026851',
  131600: '215026851',
  131601: '215026851',
  131602: '215026851',
  131603: '215026851',
  131604: '215026851',
  131605: '215026851',
  131606: '215026851'},
 'AbCode': {131598: 0,
  131599: 0,
  131600: 0,
  131601: 0,
  131602: 0,
  131603: 1,
  131604: 0,
  131605: 0,
  131606: 0},
 'ABdat': {131598: Timestamp('2018-01-24 00:00:00'),
  131599: Timestamp('2019-01-25 00:00:00'),
  131600: NaT,
  131601: Timestamp('2019-11-08 00:00:00'),
  131602: Timestamp('2020-01-24 00:00:00'),
  131603: Timestamp('2020-02-15 00:00:00'),
  131604: Timestamp('2020-10-16 00:00:00'),
  131605: Timestamp('2020-10-26 00:00:00'),
  131606: NaT}}

When formatted the data looks like below

          NID     AbCode  ABdat
131598  215026851   0   2018-01-24
131599  215026851   0   2019-01-25
131600  215026851   0   NaT
131601  215026851   0   2019-11-08
131602  215026851   0   2020-01-24
131603  215026851   1   2020-02-15
131604  215026851   0   2020-10-16
131605  215026851   0   2020-10-26
131606  215026851   0   NaT 

I would like to replace the ABdat with missing (NaT) for AbCode = 0 and replace ABdat with ABdat-7days for AbCode = 1

I wrote the following np.where code below to do this.

breed_info['ABdat'] = np.where(breed_info.AbCode == 1, breed_info['ABdat'] - pd.DateOffset(days=7), breed_info['ABdat'].isnull)

The output is presented below

          NID   AbCode       ABdat
131598  215026851   0   <bound method Series.isnull of 49017 ...
131599  215026851   0   <bound method Series.isnull of 49017 ...
131600  215026851   0   <bound method Series.isnull of 49017 ...
131601  215026851   0   <bound method Series.isnull of 49017 ...
131602  215026851   0   <bound method Series.isnull of 49017 ...
131603  215026851   1   1581120000000000000
131604  215026851   0   <bound method Series.isnull of 49017 ...
131605  215026851   0   <bound method Series.isnull of 49017 ...
131606  215026851   0   <bound method Series.isnull of 49017 ...

Could you please advise why the date format is changing and how I can avoid this from happening?

Thanks

CodePudding user response:

Simpluiest is use some pandas solutions with pandas method e.g. Series.where:

breed_info['ABdat'] = (breed_info['ABdat'] - pd.DateOffset(days=7))
                                                        .where(breed_info.AbCode == 1)

With np.where hacky solution with helper Series:

breed_info['ABdat'] = np.where(breed_info.AbCode == 1,
                               breed_info['ABdat'] - pd.DateOffset(days=7),
                                pd.Series(pd.NaT, index=breed_info.index))
print (breed_info)
              NID  AbCode      ABdat
131598  215026851       0        NaT
131599  215026851       0        NaT
131600  215026851       0        NaT
131601  215026851       0        NaT
131602  215026851       0        NaT
131603  215026851       1 2020-02-08
131604  215026851       0        NaT
131605  215026851       0        NaT
131606  215026851       0        NaT

because if passing pd.NAT it return underline numpy array (in nanoseconds):

breed_info['ABdat'] = np.where(breed_info.AbCode == 1, 
                               breed_info['ABdat'] - pd.DateOffset(days=7),
                               pd.NaT)
print (breed_info)
              NID  AbCode                ABdat
131598  215026851       0                  NaT
131599  215026851       0                  NaT
131600  215026851       0                  NaT
131601  215026851       0                  NaT
131602  215026851       0                  NaT
131603  215026851       1  1581120000000000000
131604  215026851       0                  NaT
131605  215026851       0                  NaT
131606  215026851       0                  NaT

I think reason is bug.

  • Related