Home > Software engineering >  Rolling count of changes between row elements
Rolling count of changes between row elements

Time:11-04

I have a data frame that looks like this:

Identification Date (day/month/year) X Y
123 01/01/2022 100 abc
123 02/01/2022 200 acb
123 03/01/2022 300 ary
124 01/01/2022 200 abc
124 02/01/2022 900 abc
124 03/01/2022 900 abc

I am trying to create two separate 'change' columns, one for x and y separately, that is keeping a rolling count of how many times a given element is changing over time. I would like my output to look something like this:

Identification Date (day/month/year) X Y Change X Change Y
123 01/01/2022 100 abc 0 0
123 02/01/2022 200 acb 1 1
123 03/01/2022 300 ary 2 2
124 01/01/2022 200 abc 0 0
124 02/01/2022 900 abc 1 0
124 03/01/2022 900 abc 0 0

Any help would be greatly appreciated!

Thanks :)

CodePudding user response:

This is not the most performant but gets the job done:

def consec_count(arr):
    total = 0
    out = np.zeros(len(arr), dtype=np.int32)
    acc = arr[0]
    for idx, el in enumerate(arr):
        if el == acc:
            total = 0
        else:
            total  = 1
            acc = el
        out[idx] = total
    return out

df[['Change X', 'Change Y']] = df.groupby('Identification', 
                                   group_keys=False)[['X', 'Y']].transform(
                                                        lambda x : consec_count(x.values))

And the output:

   Identification Date (day/month/year)    X    Y  Change X  Change Y
0             123            01/01/2022  100  abc         0         0
1             123            02/01/2022  200  acb         1         1
2             123            03/01/2022  300  ary         2         2
3             124            01/01/2022  200  abc         0         0
4             124            02/01/2022  900  abc         1         0
5             124            03/01/2022  900  abc         0         0

CodePudding user response:

Here is another way to do this:

from itertools import accumulate
import pandas as pd

for col in df.columns[2:]:
    df[f'Change {col}'] = None
    for id, group in df.groupby('Identification'):
        df.loc[df['Identification'] == id, f'Change {col}'] = \
            list(accumulate(group.index[:-1], lambda x, y: x   1 if group.loc[y, col] != group.loc[y   1, col] else 0, initial=0))

df

   Identification Date(day/month/year)    X    Y Change X Change Y
0             123           01/01/2022  100  abc        0        0
1             123           02/01/2022  200  acb        1        1
2             123           03/01/2022  300  ary        2        2
3             124           01/01/2022  200  abc        0        0
4             124           02/01/2022  900  abc        1        0
5             124           03/01/2022  900  abc        0        0
  • Related