Home > Back-end >  How to create new column with calculation of days between date in 2 other columns in DataFrame in Pa
How to create new column with calculation of days between date in 2 other columns in DataFrame in Pa

Time:08-01

I have DataFrame in Python Pandas like below:

col1                | col2
--------------------|--------------------
2020-01-10 19:45:49 | 2020-01-11 00:00:00
2020-01-24 20:14:33 | 2020-01-24 00:00:00
2020-01-24 11:43:15 | 2020-01-20 00:00:00
NaN                 | 2020-08-14 00:00:00
...                 | ...
  • col1 is 'object' data type and could have 'NaN' values
  • col2 is 'object' data type and does not have 'NaN' values

I need to create col3 which will present ow many days have passed since the date in col2 (col1 - col2):

  • if in col1 is NaN in col3 also should be NaN
  • if value in col2 is higher than in col1, value in col3 should be negative
  • if value in col2 is lowe than in col1, value in col3 should be positive
  • if the value in col2 and col1 is the same, value in col3 should be 0

So as a result I need something like below:

col1                | col2                | col3
--------------------|---------------------|----
2020-01-10 19:45:49 | 2020-01-11 00:00:00 | -1
2020-01-24 20:14:33 | 2020-01-24 00:00:00 | 0
2020-01-24 11:43:15 | 2020-01-20 00:00:00 | 4
NaN                 | 2020-08-14 00:00:00 | NaN
...                 | ...                 | ...

How can I do that in Python Pandas ?

CodePudding user response:

Try:

# ensure datetime:
df["col1"] = pd.to_datetime(df["col1"])
df["col2"] = pd.to_datetime(df["col2"])

df["col3"] = (df["col1"] - df["col2"]).dt.days

print(df)

Prints:

                 col1       col2  col3
0 2020-01-10 19:45:49 2020-01-11  -1.0
1 2020-01-24 20:14:33 2020-01-24   0.0
2 2020-01-24 11:43:15 2020-01-20   4.0
3                 NaT 2020-08-14   NaN
  • Related