Home > Software engineering >  How to find difference between current date row and previous date row per each group?
How to find difference between current date row and previous date row per each group?

Time:09-06

i have like this data frame

id date
a 20-01-2020
a 22-03-2020
a 15-03-2020
b 05-02-2019
b 09-05-2019
  • make loop to data frame
  • in each row per (id) group except first row in group, compute difference between date[i] and date[i-1] if result <30 or >56, put in 'status' column "no"
  • else the 'status' in this row be "yes" -the output data frame must be:
id date status
a 20-01-2020
a 20-03-2020 yes
a 15-04-2020 no
b 05-02-2019
b 09-05-2019 no

CodePudding user response:

try this:

from datetime import datetime
df['date'] = pd.to_datetime(df['date']) #in case of you didn't set it as datetime.

df['status'] = pd.Series(np.nan)
for i in range(1,len(df)):
    if (df['date'][i]-df['date'][i-1]).days < 30 or (df['date'][i]-df['date'][i-1]).days > 56:
        df['status'][i] = 'no'
    else:
        df['status'][i] = 'yes'
print(df)

the output is:

  id       date status
0  a 2020-01-20    NaN
1  a 2020-03-22     no
2  a 2020-03-15     no
3  b 2019-05-02     no
4  b 2019-09-05     no

CodePudding user response:

You can use shift to calculate difference between the rows and replace with your values based on the condition

df['status'] = (df['date'] - df['date'].shift()).dt.days
df['status'][df['status'].notnull()] = ((df['status'] < 30) | (df['status'] > 65)).replace({True: 'No', False: 'Yes'})
df = df.fillna('') # if you want to replace the first value with an empty string

Output

  id       date status
0  a 2020-01-20       
1  a 2020-03-22    Yes
2  a 2020-04-15     No
3  b 2019-02-05     No
4  b 2019-05-09     No
  • Related