I am trying to create a column from two columns based on a condition.
I have two columns "date_1" and "date_2", I am interested in the column "date_2" but it contains NaT values, for this reason I would like to create a new column "date_3" with the values of column 2, and replace the NaT values with the dates from the "date_1" column.
I am trying with np.where method, but the output is not the correct. Below is the code I am using.
date_1 = ('2021-05-25 00:00:00','2021-05-28 00:00:00', '2021-05-29 00:00:00', '2021-05-29 00:00:00')
date_2 = ('2021-05-21 00:00:00',pd.NaT, '2021-05-26 00:00:00', pd.NaT)
df= pd.DataFrame(list(zip(date_1, date_2)), columns = ['date_1','date_2'])
df['date_1'] = pd.to_datetime(df['date_1'])
df['date_2'] = pd.to_datetime(df['date_2'])
df["date_3"] = np.where(df['date_1'] == pd.NaT, df['date_2'], df['date_1'])
CodePudding user response:
I think there is no need to use np.where. This is my solution to your problem:
df["date_3"] = df["date_2"]
df["date_3"] = df["date_3"].fillna(value=df["date_1"])
print(df)
Output:
date_1 date_2 date_3
0 2021-05-25 2021-05-21 2021-05-21
1 2021-05-28 NaT 2021-05-28
2 2021-05-29 2021-05-26 2021-05-26
3 2021-05-29 NaT 2021-05-29