I have DataFrame in Python Pandas like below:
col1 | col2
--------------------|--------------------
2020-01-10 19:45:49 | 2020-01-11 00:00:00
2020-01-24 20:14:33 | 2020-01-24 00:00:00
2020-01-24 11:43:15 | 2020-01-20 00:00:00
NaN | 2020-08-14 00:00:00
... | ...
- col1 is 'object' data type and could have 'NaN' values
- col2 is 'object' data type and does not have 'NaN' values
I need to create col3 which will present ow many days have passed since the date in col2 (col1 - col2):
- if in col1 is NaN in col3 also should be NaN
- if value in col2 is higher than in col1, value in col3 should be negative
- if value in col2 is lowe than in col1, value in col3 should be positive
- if the value in col2 and col1 is the same, value in col3 should be 0
So as a result I need something like below:
col1 | col2 | col3
--------------------|---------------------|----
2020-01-10 19:45:49 | 2020-01-11 00:00:00 | -1
2020-01-24 20:14:33 | 2020-01-24 00:00:00 | 0
2020-01-24 11:43:15 | 2020-01-20 00:00:00 | 4
NaN | 2020-08-14 00:00:00 | NaN
... | ... | ...
How can I do that in Python Pandas ?
CodePudding user response:
Try:
# ensure datetime:
df["col1"] = pd.to_datetime(df["col1"])
df["col2"] = pd.to_datetime(df["col2"])
df["col3"] = (df["col1"] - df["col2"]).dt.days
print(df)
Prints:
col1 col2 col3
0 2020-01-10 19:45:49 2020-01-11 -1.0
1 2020-01-24 20:14:33 2020-01-24 0.0
2 2020-01-24 11:43:15 2020-01-20 4.0
3 NaT 2020-08-14 NaN