i have like this data frame
id | date |
---|---|
a | 20-01-2020 |
a | 22-03-2020 |
a | 15-03-2020 |
b | 05-02-2019 |
b | 09-05-2019 |
- make loop to data frame
- in each row per (id) group except first row in group, compute difference between date[i] and date[i-1] if result <30 or >56, put in 'status' column "no"
- else the 'status' in this row be "yes" -the output data frame must be:
id | date | status |
---|---|---|
a | 20-01-2020 | |
a | 20-03-2020 | yes |
a | 15-04-2020 | no |
b | 05-02-2019 | |
b | 09-05-2019 | no |
CodePudding user response:
try this:
from datetime import datetime
df['date'] = pd.to_datetime(df['date']) #in case of you didn't set it as datetime.
df['status'] = pd.Series(np.nan)
for i in range(1,len(df)):
if (df['date'][i]-df['date'][i-1]).days < 30 or (df['date'][i]-df['date'][i-1]).days > 56:
df['status'][i] = 'no'
else:
df['status'][i] = 'yes'
print(df)
the output is:
id date status
0 a 2020-01-20 NaN
1 a 2020-03-22 no
2 a 2020-03-15 no
3 b 2019-05-02 no
4 b 2019-09-05 no
CodePudding user response:
You can use shift
to calculate difference between the rows and replace with your values based on the condition
df['status'] = (df['date'] - df['date'].shift()).dt.days
df['status'][df['status'].notnull()] = ((df['status'] < 30) | (df['status'] > 65)).replace({True: 'No', False: 'Yes'})
df = df.fillna('') # if you want to replace the first value with an empty string
Output
id date status
0 a 2020-01-20
1 a 2020-03-22 Yes
2 a 2020-04-15 No
3 b 2019-02-05 No
4 b 2019-05-09 No