I have a data frame that looks like this:
Identification | Date (day/month/year) | X | Y |
---|---|---|---|
123 | 01/01/2022 | 100 | abc |
123 | 02/01/2022 | 200 | acb |
123 | 03/01/2022 | 300 | ary |
124 | 01/01/2022 | 200 | abc |
124 | 02/01/2022 | 900 | abc |
124 | 03/01/2022 | 900 | abc |
I am trying to create two separate 'change' columns, one for x and y separately, that is keeping a rolling count of how many times a given element is changing over time. I would like my output to look something like this:
Identification | Date (day/month/year) | X | Y | Change X | Change Y |
---|---|---|---|---|---|
123 | 01/01/2022 | 100 | abc | 0 | 0 |
123 | 02/01/2022 | 200 | acb | 1 | 1 |
123 | 03/01/2022 | 300 | ary | 2 | 2 |
124 | 01/01/2022 | 200 | abc | 0 | 0 |
124 | 02/01/2022 | 900 | abc | 1 | 0 |
124 | 03/01/2022 | 900 | abc | 0 | 0 |
Any help would be greatly appreciated!
Thanks :)
CodePudding user response:
This is not the most performant but gets the job done:
def consec_count(arr):
total = 0
out = np.zeros(len(arr), dtype=np.int32)
acc = arr[0]
for idx, el in enumerate(arr):
if el == acc:
total = 0
else:
total = 1
acc = el
out[idx] = total
return out
df[['Change X', 'Change Y']] = df.groupby('Identification',
group_keys=False)[['X', 'Y']].transform(
lambda x : consec_count(x.values))
And the output:
Identification Date (day/month/year) X Y Change X Change Y
0 123 01/01/2022 100 abc 0 0
1 123 02/01/2022 200 acb 1 1
2 123 03/01/2022 300 ary 2 2
3 124 01/01/2022 200 abc 0 0
4 124 02/01/2022 900 abc 1 0
5 124 03/01/2022 900 abc 0 0
CodePudding user response:
Here is another way to do this:
from itertools import accumulate
import pandas as pd
for col in df.columns[2:]:
df[f'Change {col}'] = None
for id, group in df.groupby('Identification'):
df.loc[df['Identification'] == id, f'Change {col}'] = \
list(accumulate(group.index[:-1], lambda x, y: x 1 if group.loc[y, col] != group.loc[y 1, col] else 0, initial=0))
df
Identification Date(day/month/year) X Y Change X Change Y
0 123 01/01/2022 100 abc 0 0
1 123 02/01/2022 200 acb 1 1
2 123 03/01/2022 300 ary 2 2
3 124 01/01/2022 200 abc 0 0
4 124 02/01/2022 900 abc 1 0
5 124 03/01/2022 900 abc 0 0