I want to create a new column if survey_start date is with 12 months prior to start_date then 1 else 0
ID | survey_date | start_date |
---|---|---|
1 | 2019-05-01 | 2022-05-01 |
1 | 2021-05-01 | 2022-05-01 |
1 | 2023-05-01 | 2022-05-01 |
1 | 2021-12-01 | 2022-05-01 |
2 | 2010-01-01 | 2010-02-01 |
2 | 2011-05-01 | 2010-02-01 |
output
ID | survey_date | start_date | new_column |
---|---|---|---|
1 | 2019-05-01 | 2022-05-01 | 0 |
1 | 2021-05-01 | 2022-05-01 | 1 |
1 | 2023-05-01 | 2022-05-01 | 0 |
1 | 2021-12-01 | 2022-05-01 | 1 |
2 | 2010-01-01 | 2010-02-01 | 1 |
2 | 2011-05-01 | 2010-02-01 | 0 |
How do I do this in pandas dataframe?
CodePudding user response:
You can use relativedelta
?
from dateutil.relativedelta import *
df['survey_date'] = pd.to_datetime(df['survey_date'])
df['start_date'] = pd.to_datetime(df['start_date'])
df['new column'] = df.apply(lambda x: relativedelta(x['start_date'], x['survey_date']), axis=1)
df['new column'] = df['new column'].apply(lambda x: 12*x.years x.months)
df['new column'] = df['new column'].apply(lambda x: 1 if (x <= 12) & (x >= 0) else 0)
print(df):
ID survey_date start_date new column
0 1 2019-05-01 2022-05-01 0
1 1 2021-05-01 2022-05-01 1
2 1 2023-05-01 2022-05-01 0
3 1 2021-12-01 2022-05-01 1
4 2 2010-01-01 2010-02-01 1
5 2 2011-05-01 2010-02-01 0
CodePudding user response:
here is one way to do it using np.where
df['new_col'] = np.where(((df['start_date'].astype('datetime64') - df['survey_date'].astype('datetime64') ).dt.days).between(0,366),1, 0)
df
ID survey_date start_date new_col
0 1 2019-05-01 2022-05-01 0
1 1 2021-05-01 2022-05-01 1
2 1 2023-05-01 2022-05-01 0
3 1 2021-12-01 2022-05-01 1
4 2 2010-01-01 2010-02-01 1
5 2 2011-05-01 2010-02-01 0