I have to assign a given datetime value into timestamp
column to rows with NaT values if a condition is met in another column. All the values in timestamp
are either datetime64[ns]
or NaT
.
EDIT:
sample data:
dates = [pd.to_datetime('2022-10-14 10:13:52', format = "%Y-%m-%d %H:%M:%S"),
pd.to_datetime('2022-10-14 17:43:52', format = "%Y-%m-%d %H:%M:%S"),
pd.to_datetime('2022-10-14 09:00:10', format = "%Y-%m-%d %H:%M:%S")]
data = {'A': [-0.5, -0.5, 0.7, 1, 0.65, 0.5], 'timestamp': pd.Series(dates, index=[1, 3, 5])}
df = pd.DataFrame(data = data, index=[0, 1, 2, 3, 4, 5])
Output:
A timestamp
0 -0.50 NaT
1 -0.50 2022-10-14 10:13:52
2 0.70 NaT
3 1.00 2022-10-14 17:43:52
4 0.65 NaT
5 0.50 2022-10-14 09:00:10
I then do the following:
threshold = 0.65
null_date = pd.to_datetime('2022-09-01 09:00:00', format = "%Y-%m-%d %H:%M:%S")
df.timestamp = np.where(df.A >= threshold, null_date, df.timestamp)
However, this turns all the values in timestamp
into object type.
A timestamp
0 -0.50 None
1 -0.50 1665742432000000000
2 0.70 2022-09-01 09:00:00
3 1.00 2022-09-01 09:00:00
4 0.65 2022-09-01 09:00:00
5 0.50 1665738010000000000
That is, NaTs
in rows where the condition is not met are replaced with None
. Datetime
in those rows are replaced, too. Only the rows where the condition was met get a datetime
.
Does anyone have any suggestions how to replace NaT with a given datetime by condition?
EDIT-2:
It worked out with lambda function:
df.timestamp = df[['A', 'timestamp']].apply(lambda x: null_date if x['A'] >= threshold else x['timestamp'], axis=1)
Output:
A timestamp
0 -0.50 NaT
1 -0.50 2022-10-14 10:13:52
2 0.70 2022-09-01 09:00:00
3 1.00 2022-09-01 09:00:00
4 0.65 2022-09-01 09:00:00
5 0.50 2022-10-14 09:00:10
CodePudding user response:
The type of the null_date value does not match the type of the df['timestamp'] value. both must be datetime64. use this:
threshold = 0.65
null_date = pd.to_datetime('2022-09-01 09:00:00', format = "%Y-%m-%d %H:%M:%S")
null_date = np.datetime64(null_date)
df['timestamp'] = np.where(df['A'] >= threshold, null_date, df['timestamp'])
CodePudding user response:
i think this will work:
import pandas as pd
dates = [pd.to_datetime('2022-10-14 10:13:52', format = "%Y-%m-%d %H:%M:%S"),
pd.to_datetime('2022-10-14 17:43:52', format = "%Y-%m-%d %H:%M:%S"),
pd.to_datetime('2022-10-14 09:00:10', format = "%Y-%m-%d %H:%M:%S")]
data = {'A': [-0.5, -0.5, 0.7, 1, 0.65, 0.5], 'timestamp': pd.Series(dates, index=[1, 3, 5])}
df = pd.DataFrame(data = data, index=[0, 1, 2, 3, 4, 5])
threshold = 0.65
null_date = pd.to_datetime('2022-09-01 09:00:00', format = "%Y-%m-%d %H:%M:%S")
#df.timestamp = np.where(df.A >= threshold, null_date, df.timestamp)
df.loc[df.A >= threshold, 'timestamp'] = null_date
>>> df
A timestamp
0 -0.50 NaT
1 -0.50 2022-10-14 10:13:52
2 0.70 2022-09-01 09:00:00
3 1.00 2022-09-01 09:00:00
4 0.65 2022-09-01 09:00:00
5 0.50 2022-10-14 09:00:10
>>>