Home > Mobile >  New pandas DataFrame column from datetime calculation
New pandas DataFrame column from datetime calculation

Time:12-06

I am trying to calculate the number of days that have elapsed since the launch of a marketing campaign. I have one row per date for each marketing campaign in my DataFrame (df) and all dates start from the same day (though there is not a data point for each day for each campaign). In column 'b' I have the date relating to the data points of interest (dateime64[ns]) and in column 'c' I have the launch date of the marketing campaign (dateime64[ns]). I would like the resulting calculation to return n/a (or np.NaN or a suitable alternative) when column 'b' is earlier than column 'c', else I would like the calculation to return the difference the two dates.

Campaign Date Launch Date Desired Column
A 2019-09-01 2022-12-01 n/a
A 2019-09-02 2022-12-01 n/a
B 2019-09-01 2019-09-01 0
B 2019-09-25 2019-09-01 24

When I try:

df['Days Since Launch'] = df['Date'] - df['Launch Date']

What I would hope returns a negative value actually returns a positive one, thus leading to duplicate values when I have dates that are 10 days prior and 10 days after the launch date.

When I try:

df['Days Since Launch'] = np.where(df['Date'] < df['Launch Date'], XXX, df['Date'] - df['Launch Date'])

Where XXX has to be the same data type as the two input columns, so I can't enter np.NaN because the calculation will fail, nor can I enter a date as this will still leave the same issue that i want to solve. IF statements do not work as the "truth value of a Series is ambiguous". Any ideas?

CodePudding user response:

You can use a direct subtraction and conversion to days with dt.days, then mask the negative values with where:

s = pd.to_datetime(df['Date']).sub(pd.to_datetime(df['Launch Date'])).dt.days
# or, if already datetime:
#s = df['Date'].sub(df['Launch Date']).dt.days

df['Desired Column'] = s.where(s.ge(0))

Alternative closer to your initial attempt, using mask:

df['Desired Column'] = (df['Date'].sub(df['Launch Date'])
                         .mask(df['Date'] < df['Launch Date'])
                       )

Output:

  Campaign        Date Launch Date  Desired Column
0        A  2019-09-01  2022-12-01             NaN
1        A  2019-09-02  2022-12-01             NaN
2        B  2019-09-01  2019-09-01             0.0
3        B  2019-09-25  2019-09-01            24.0

CodePudding user response:

Add Series.dt.days for convert timedeltas to days:

df['Days Since Launch'] = np.where(df['Date'] < df['Launch Date'], 
                                   np.nan, 
                                   (df['Date'] - df['Launch Date']).dt.days)

print (df)

  Campaign       Date Launch Date  Desired Column  Days Since Launch
0        A 2019-09-01  2022-12-01             NaN                NaN
1        A 2019-09-02  2022-12-01             NaN                NaN
2        B 2019-09-01  2019-09-01             0.0                0.0
3        B 2019-09-25  2019-09-01            24.0               24.0

CodePudding user response:

Another alternative:

df["Date"] = pd.to_datetime(df["Date"])
df["Launch Date"] = pd.to_datetime(df["Launch Date"])
df["Desired Column"] = df.apply(lambda x: x["Date"] - x["Launch Date"] if x["Date"] >= x["Launch Date"] else None, axis=1)

  • Related