I have the following subset of data from a dataframe.
{'NID': {131598: '215026851',
131599: '215026851',
131600: '215026851',
131601: '215026851',
131602: '215026851',
131603: '215026851',
131604: '215026851',
131605: '215026851',
131606: '215026851'},
'AbCode': {131598: 0,
131599: 0,
131600: 0,
131601: 0,
131602: 0,
131603: 1,
131604: 0,
131605: 0,
131606: 0},
'ABdat': {131598: Timestamp('2018-01-24 00:00:00'),
131599: Timestamp('2019-01-25 00:00:00'),
131600: NaT,
131601: Timestamp('2019-11-08 00:00:00'),
131602: Timestamp('2020-01-24 00:00:00'),
131603: Timestamp('2020-02-15 00:00:00'),
131604: Timestamp('2020-10-16 00:00:00'),
131605: Timestamp('2020-10-26 00:00:00'),
131606: NaT}}
When formatted the data looks like below
NID AbCode ABdat
131598 215026851 0 2018-01-24
131599 215026851 0 2019-01-25
131600 215026851 0 NaT
131601 215026851 0 2019-11-08
131602 215026851 0 2020-01-24
131603 215026851 1 2020-02-15
131604 215026851 0 2020-10-16
131605 215026851 0 2020-10-26
131606 215026851 0 NaT
I would like to replace the ABdat with missing (NaT) for AbCode = 0 and replace ABdat with ABdat-7days for AbCode = 1
I wrote the following np.where code below to do this.
breed_info['ABdat'] = np.where(breed_info.AbCode == 1, breed_info['ABdat'] - pd.DateOffset(days=7), breed_info['ABdat'].isnull)
The output is presented below
NID AbCode ABdat
131598 215026851 0 <bound method Series.isnull of 49017 ...
131599 215026851 0 <bound method Series.isnull of 49017 ...
131600 215026851 0 <bound method Series.isnull of 49017 ...
131601 215026851 0 <bound method Series.isnull of 49017 ...
131602 215026851 0 <bound method Series.isnull of 49017 ...
131603 215026851 1 1581120000000000000
131604 215026851 0 <bound method Series.isnull of 49017 ...
131605 215026851 0 <bound method Series.isnull of 49017 ...
131606 215026851 0 <bound method Series.isnull of 49017 ...
Could you please advise why the date format is changing and how I can avoid this from happening?
Thanks
CodePudding user response:
Simpluiest is use some pandas solutions with pandas method e.g. Series.where
:
breed_info['ABdat'] = (breed_info['ABdat'] - pd.DateOffset(days=7))
.where(breed_info.AbCode == 1)
With np.where
hacky solution with helper Series
:
breed_info['ABdat'] = np.where(breed_info.AbCode == 1,
breed_info['ABdat'] - pd.DateOffset(days=7),
pd.Series(pd.NaT, index=breed_info.index))
print (breed_info)
NID AbCode ABdat
131598 215026851 0 NaT
131599 215026851 0 NaT
131600 215026851 0 NaT
131601 215026851 0 NaT
131602 215026851 0 NaT
131603 215026851 1 2020-02-08
131604 215026851 0 NaT
131605 215026851 0 NaT
131606 215026851 0 NaT
because if passing pd.NAT
it return underline numpy array (in nanoseconds):
breed_info['ABdat'] = np.where(breed_info.AbCode == 1,
breed_info['ABdat'] - pd.DateOffset(days=7),
pd.NaT)
print (breed_info)
NID AbCode ABdat
131598 215026851 0 NaT
131599 215026851 0 NaT
131600 215026851 0 NaT
131601 215026851 0 NaT
131602 215026851 0 NaT
131603 215026851 1 1581120000000000000
131604 215026851 0 NaT
131605 215026851 0 NaT
131606 215026851 0 NaT
I think reason is bug.