I have a sample data that looks like this.
Column DateDuration was calculated in Excel, following below logic:
- DateDuration between SecondDate and FirstDate >= 28, then DateDuration = SecondDate - FirstDate.
- DateDuration between SecondDate and FirstDate <28, if ThirdDate = nan, then DateDuration = SecondDate - FirstDate.
- DateDuration between SecondDate and FirstDate <28, if ThirdDate = not nan, then consider (ThirdDate - FirstDate):
- ThirdDate - FirstDate >= 28, then DateDuration = ThirdDate - FirstDate.
- ThirdDate - FirstDate < 28, if FourthDate = nan, then DateDuration = ThirdDate - FirstDate.
- ThirdDate - FirstDate < 28, if FourthDate = not nan, then DateDuration = FourthDate - FirstDate.
I would like to calculate DateDuration in Python but do not know how to do about this.
Types of data in Python:
- ID int64
- FirstDate object
- SecondDate object
- ThirdDate object
- FourthDate object
I am new to Python. Any help would be greatly appreciated!!
import pandas as pd
import numpy as np
df['FirstDate'] = pd.to_datetime(df['FirstDate'])
df['SecondDate'] = pd.to_datetime(df['SecondDate'])
df['DayDifference2'] = (df['SecondDate']) -(df['FirstDate'])
df['DayDifference3'] = (df['ThirdDate']) -(df['FirstDate'])
df['DayDifference4'] = (df['FourthDate']) -(df['FirstDate'])
x = df['DayDifference2'].dt.days
y = df['DayDifference3'].dt.days
z = df['DayDifference4'].dt.days
condlist = [x<28, x>=28]
choicelist = [(df['ThirdDate']) -(df['FirstDate']), (df['SecondDate']) -(df['FirstDate'])]
np.select(condlist, choicelist)
...
| ID | FirstDate | SecondDate | ThirdDate |FourthDate | DateDuration |
| -------- | -------------- | -------- | -------------- | -------- | -------------- |
|2914300|2021-09-23|2021-10-07|2021-11-29|2021-12-20|67|
|3893461|2021-09-08|2021-10-06|2022-04-07||211|
|4343075|2021-06-23|2021-09-27|||96|
|4347772|2021-06-23|2021-09-27|||96|
|4551963|2021-08-02|2021-10-14|2022-03-11||73|
|4893324|2021-09-30|2021-10-01|2022-03-03|2022-03-10|154|
|5239991|2021-06-24|2021-08-26|2021-09-25|2022-02-03|63|
|8454947|2021-09-28|2021-10-05|||7|
|8581390|2021-09-27|2022-03-21|2022-03-25||175|
|8763766|2021-09-20|2021-10-04|2021-12-09||80|
|9144185|2021-06-18|2021-06-23|||5|
|9967685|2021-09-13|2021-10-29|2022-02-07|2022-03-23|46|
|11367560|2021-08-31|2021-09-28|2021-10-21|2022-02-11|51|
CodePudding user response:
Refer to the time module built-in. It allows for more time class types that I actually used for my own workout routine maker.
import datetime as dt
# particularly the types:
dt.timedelta(1)
# and
dt.time(minute=0, second=0)
# there are also date classes you can use.
Documentation:
NaT
is a missing value of datetime64[ns]
type
Conditions & Choices
conditions = [
(df['SecondDate'] - df['FirstDate']).dt.days >= 28,
((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].isna(),
((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days >= 28),
((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days < 28) & df['FourthDate'].isna(),
((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days < 28) & df['FourthDate'].notna()
]
choices = [
(df['SecondDate'] - df['FirstDate']).dt.days,
(df['SecondDate'] - df['FirstDate']).dt.days,
(df['ThirdDate'] - df['FirstDate']).dt.days,
(df['ThirdDate'] - df['FirstDate']).dt.days,
(df['FourthDate'] - df['FirstDate']).dt.days
]
df['Duration'] = np.select(conditions, choices)
df
Discussion:
There are some differences, e.g., second row, ID = 3893461
, according to your conditions(DateDuration between SecondDate and FirstDate >= 28, then DateDuration = SecondDate - FirstDate.), SecondDate
- FirstDate
of ID = 3893461
is 28
, same thing happened on last row, ID = 11367560