Lets say I have a column 'MEMBERSHIP_LENGTH' with the value is integer for example, the value is 100 means this id had been a member for 100 days. I want to know what date this id applied for a membership from today.
So what I am thinking is
df['APPLIED_DATE'] = pd.to_datetime('2023-01-21') - timedelta(days=df['MEMBERSHIP_LENGTH'])
but I got an error TypeError: unsupported type for timedelta days component: Series.
How should I do it?
CodePudding user response:
The error you're getting is because timedelta
expects a single integer value for the days argument, but you're passing it a column of values (df['MEMBERSHIP_LENGTH']
). To solve this, you can use the apply()
function to apply the subtraction to each value in the column. Here's an example:
df['APPLIED_DATE'] = pd.to_datetime('2023-01-21') - df['MEMBERSHIP_LENGTH'].apply(lambda x: timedelta(days=x))
This will subtract the value of the MEMBERSHIP_LENGTH
column from the date '2023-01-21' for each row in the dataframe, and store the result in a new column 'APPLIED_DATE
'.
Another way to do this is by using pd.to_timedelta
and subtract as well.
df['APPLIED_DATE'] = pd.to_datetime('2023-01-21') - pd.to_timedelta(df['MEMBERSHIP_LENGTH'], unit='d')
This is also more efficient than using apply()
function.
CodePudding user response:
How about
from datetime import datetime, timedelta
df['APPLIED_DATE']=df['MEMBERSHIP_LENGTH'].apply(lambda x: datetime.today() - timedelta(days=x))
df
Out[28]:
MEMBERSHIP_LENGTH APPLIED_DATE
0 100 2022-10-13 09:56:49.854174
1 101 2022-10-12 09:56:49.854174
2 200 2022-07-05 09:56:49.854174
3 201 2022-07-04 09:56:49.854174