I want to write a condition that checks a date and returns another date.
But numpy.where
keeps converting my date into a long integer. I looked around to try and find a solution, but I can't apply it to my situation. I'm a bit new to python.
Here is my code:
df = {'my_date':['1900-01-01','2021-10-01','2021-08-04']}
df = pd.DataFrame(df)
df['new_date'] = np.where(pd.to_datetime(df.my_date) > pd.to_datetime('2022-01-01'),pd.to_datetime('1900-01-01'),pd.to_datetime(df.my_date))
df
my_date new_date
0 1900-01-01 -2208988800000000000
1 2021-10-01 1633046400000000000
2 2021-08-04 1628035200000000000
I don't understand what I'm doing wrong and if there is a better way to do my condition statement?
Thanks
CodePudding user response:
I don't think you need np.where
here, or so many pd.to_datetime
calls.
df['my_date'] = pd.to_datetime(df['my_date'])
df[df['my_date'] > '2022-01-01'] = '1900-01-01'
If you want to add it to a new column, use this:
df['new_column'] = df['my_date'].where(~(df['my_date'] > '2022-01-01'), '1900-01-01')