Home > other >  Compare current row timestamp with previous row with condition in function python
Compare current row timestamp with previous row with condition in function python

Time:05-03

I have a df sample with one of the columns named date_code, dtype datetime64[ns]:

date_code
2022-03-28
2022-03-29
2022-03-30
2022-03-31
2022-04-01
2022-04-07
2022-04-07
2022-04-08
2022-04-12
2022-04-12
2022-04-14
2022-04-14
2022-04-15
2022-04-16
2022-04-16
2022-04-17
2022-04-18
2022-04-19
2022-04-20
2022-04-20
2022-04-21
2022-04-22
2022-04-25
2022-04-25
2022-04-26

I would like to create a column based on some conditions comparing current row with previous. I trying to create a function like:

def start_date(row):
    if (row['date_code'] - row['date_code'].shift(-1)).days >1:
        val = row['date_code'].shift(-1)
    elif row['date_code'] == row['date_code'].shift(-1):
        val = row['date_code']
    else:
        val = np.nan()
    return val

But once I apply

sample['date_zero_recorded'] = sample.apply(start_date, axis=1)

I get error:

AttributeError: 'Timestamp' object has no attribute 'shift'

How I should compare current row with previous with condition?

Edited: expected outoput

if current row more than previous by 2 or more, get previous

if current row equal past, get current

else, return NaN (incl. if current >1 than previous)

date_code   date_zero_recorded
2022-03-28  NaN
2022-03-29  NaN
2022-03-30  NaN
2022-03-31  NaN
2022-04-01  NaN
2022-04-07  2022-04-01
2022-04-07  2022-04-07
2022-04-08  NaN
2022-04-12  2022-04-08
2022-04-12  2022-04-12
2022-04-14  2022-04-12
2022-04-14  2022-04-14
2022-04-15  NaN
2022-04-16  NaN
2022-04-16  2022-04-16
2022-04-17  NaN
2022-04-18  NaN
2022-04-19  NaN
2022-04-20  NaN
2022-04-20  2022-04-20
2022-04-21  NaN
2022-04-22  NaN
2022-04-25  2022-04-22
2022-04-25  2022-04-25
2022-04-26  NaN

CodePudding user response:

You shouldn't use iterrows and use vectorial code instead.

For example:

sample['date_code'] = pd.to_datetime(sample['date_code'])

sample['date_zero_recorded'] = (
 sample['date_code'].shift()
 .where(sample['date_code'].diff().ne('1d'))
)

output:

    date_code date_zero_recorded
0  2022-03-28                NaT
1  2022-03-29                NaT
2  2022-03-30                NaT
3  2022-03-31                NaT
4  2022-04-01                NaT
5  2022-04-07         2022-04-01
6  2022-04-07         2022-04-07
7  2022-04-08                NaT
8  2022-04-12         2022-04-08
9  2022-04-12         2022-04-12
10 2022-04-14         2022-04-12
11 2022-04-14         2022-04-14
12 2022-04-15                NaT
13 2022-04-16                NaT
14 2022-04-16         2022-04-16
15 2022-04-17                NaT
16 2022-04-18                NaT
17 2022-04-19                NaT
18 2022-04-20                NaT
19 2022-04-20         2022-04-20
20 2022-04-21                NaT
21 2022-04-22                NaT
22 2022-04-25         2022-04-22
23 2022-04-25         2022-04-25
24 2022-04-26                NaT

  • Related