I am trying to calculate the number of days that have elapsed since the launch of a marketing campaign. I have one row per date for each marketing campaign in my DataFrame (df) and all dates start from the same day (though there is not a data point for each day for each campaign). In column 'b' I have the date relating to the data points of interest (dateime64[ns]) and in column 'c' I have the launch date of the marketing campaign (dateime64[ns]). I would like the resulting calculation to return n/a (or np.NaN or a suitable alternative) when column 'b' is earlier than column 'c', else I would like the calculation to return the difference the two dates.
Campaign | Date | Launch Date | Desired Column |
---|---|---|---|
A | 2019-09-01 | 2022-12-01 | n/a |
A | 2019-09-02 | 2022-12-01 | n/a |
B | 2019-09-01 | 2019-09-01 | 0 |
B | 2019-09-25 | 2019-09-01 | 24 |
When I try:
df['Days Since Launch'] = df['Date'] - df['Launch Date']
What I would hope returns a negative value actually returns a positive one, thus leading to duplicate values when I have dates that are 10 days prior and 10 days after the launch date.
When I try:
df['Days Since Launch'] = np.where(df['Date'] < df['Launch Date'], XXX, df['Date'] - df['Launch Date'])
Where XXX has to be the same data type as the two input columns, so I can't enter np.NaN because the calculation will fail, nor can I enter a date as this will still leave the same issue that i want to solve. IF statements do not work as the "truth value of a Series is ambiguous". Any ideas?
CodePudding user response:
You can use a direct subtraction and conversion to days with dt.days
, then mask the negative values with where
:
s = pd.to_datetime(df['Date']).sub(pd.to_datetime(df['Launch Date'])).dt.days
# or, if already datetime:
#s = df['Date'].sub(df['Launch Date']).dt.days
df['Desired Column'] = s.where(s.ge(0))
Alternative closer to your initial attempt, using mask
:
df['Desired Column'] = (df['Date'].sub(df['Launch Date'])
.mask(df['Date'] < df['Launch Date'])
)
Output:
Campaign Date Launch Date Desired Column
0 A 2019-09-01 2022-12-01 NaN
1 A 2019-09-02 2022-12-01 NaN
2 B 2019-09-01 2019-09-01 0.0
3 B 2019-09-25 2019-09-01 24.0
CodePudding user response:
Add Series.dt.days
for convert timedeltas to days:
df['Days Since Launch'] = np.where(df['Date'] < df['Launch Date'],
np.nan,
(df['Date'] - df['Launch Date']).dt.days)
print (df)
Campaign Date Launch Date Desired Column Days Since Launch
0 A 2019-09-01 2022-12-01 NaN NaN
1 A 2019-09-02 2022-12-01 NaN NaN
2 B 2019-09-01 2019-09-01 0.0 0.0
3 B 2019-09-25 2019-09-01 24.0 24.0
CodePudding user response:
Another alternative:
df["Date"] = pd.to_datetime(df["Date"])
df["Launch Date"] = pd.to_datetime(df["Launch Date"])
df["Desired Column"] = df.apply(lambda x: x["Date"] - x["Launch Date"] if x["Date"] >= x["Launch Date"] else None, axis=1)