Home > Mobile >  How to do conditional calculations in Python
How to do conditional calculations in Python

Time:07-13

I have a sample data that looks like this.

Column DateDuration was calculated in Excel, following below logic:

  • DateDuration between SecondDate and FirstDate >= 28, then DateDuration = SecondDate - FirstDate.
  • DateDuration between SecondDate and FirstDate <28, if ThirdDate = nan, then DateDuration = SecondDate - FirstDate.
  • DateDuration between SecondDate and FirstDate <28, if ThirdDate = not nan, then consider (ThirdDate - FirstDate):
    • ThirdDate - FirstDate >= 28, then DateDuration = ThirdDate - FirstDate.
    • ThirdDate - FirstDate < 28, if FourthDate = nan, then DateDuration = ThirdDate - FirstDate.
    • ThirdDate - FirstDate < 28, if FourthDate = not nan, then DateDuration = FourthDate - FirstDate.

I would like to calculate DateDuration in Python but do not know how to do about this.

Types of data in Python:

  • ID int64
  • FirstDate object
  • SecondDate object
  • ThirdDate object
  • FourthDate object

I am new to Python. Any help would be greatly appreciated!!

import pandas as pd
import numpy as np
df['FirstDate'] = pd.to_datetime(df['FirstDate'])
df['SecondDate'] = pd.to_datetime(df['SecondDate'])
df['DayDifference2'] = (df['SecondDate']) -(df['FirstDate'])
df['DayDifference3'] = (df['ThirdDate']) -(df['FirstDate'])
df['DayDifference4'] = (df['FourthDate']) -(df['FirstDate'])
    
x = df['DayDifference2'].dt.days
y = df['DayDifference3'].dt.days
z = df['DayDifference4'].dt.days

condlist = [x<28, x>=28]
choicelist = [(df['ThirdDate']) -(df['FirstDate']), (df['SecondDate']) -(df['FirstDate'])]
np.select(condlist, choicelist)
...
| ID       | FirstDate      | SecondDate | ThirdDate      |FourthDate | DateDuration    |
| -------- | -------------- | --------   | -------------- | --------  | -------------- |
|2914300|2021-09-23|2021-10-07|2021-11-29|2021-12-20|67|
|3893461|2021-09-08|2021-10-06|2022-04-07||211|
|4343075|2021-06-23|2021-09-27|||96|
|4347772|2021-06-23|2021-09-27|||96|
|4551963|2021-08-02|2021-10-14|2022-03-11||73|
|4893324|2021-09-30|2021-10-01|2022-03-03|2022-03-10|154|
|5239991|2021-06-24|2021-08-26|2021-09-25|2022-02-03|63|
|8454947|2021-09-28|2021-10-05|||7|
|8581390|2021-09-27|2022-03-21|2022-03-25||175|
|8763766|2021-09-20|2021-10-04|2021-12-09||80|
|9144185|2021-06-18|2021-06-23|||5|
|9967685|2021-09-13|2021-10-29|2022-02-07|2022-03-23|46|
|11367560|2021-08-31|2021-09-28|2021-10-21|2022-02-11|51|

CodePudding user response:

Refer to the time module built-in. It allows for more time class types that I actually used for my own workout routine maker.

import datetime as dt

# particularly the types:
dt.timedelta(1)
# and
dt.time(minute=0, second=0)
# there are also date classes you can use. 

Documentation: enter image description here NaT is a missing value of datetime64[ns] type




Conditions & Choices

conditions = [
    (df['SecondDate'] - df['FirstDate']).dt.days >= 28,
    ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].isna(),
    ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days >= 28),
    ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days < 28) & df['FourthDate'].isna(),
    ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days < 28) & df['FourthDate'].notna()
]

choices = [
    (df['SecondDate'] - df['FirstDate']).dt.days,
    (df['SecondDate'] - df['FirstDate']).dt.days,
    (df['ThirdDate'] - df['FirstDate']).dt.days,
    (df['ThirdDate'] - df['FirstDate']).dt.days,
    (df['FourthDate'] - df['FirstDate']).dt.days
]



df['Duration'] = np.select(conditions, choices)
df

Result enter image description here



Discussion: There are some differences, e.g., second row, ID = 3893461, according to your conditions(DateDuration between SecondDate and FirstDate >= 28, then DateDuration = SecondDate - FirstDate.), SecondDate - FirstDate of ID = 3893461 is 28, same thing happened on last row, ID = 11367560

  • Related