how to apply shift to a group when there is a change in value pandas?-CodePudding

I am currently having a dataset as below:

id  name date_and_hour
1   SS   1/1/2019 00:12
1   SS   1/1/2019 00:13
1   SS   1/1/2019 00:14
1   SB   1/1/2019 00:15
1   SS   1/1/2019 00:16
2   SE   1/1/2019 01:15
2   SR   1/1/2019 01:16
2   SS   1/1/2019 01:17
2   SS   1/1/2019 01:18

I want the next name with the changed value only based on group ID. Output looks as below

id  name date_and_hour   next_name
1   SS   1/1/2019 00:12  SB
1   SS   1/1/2019 00:13  SB
1   SS   1/1/2019 00:14  SB
1   SB   1/1/2019 00:15  SS
1   SS   1/1/2019 00:16  null
2   SE   1/1/2019 01:15  SR
2   SR   1/1/2019 01:16  SS
2   SS   1/1/2019 01:17  SR
2   SR   1/1/2019 01:18  null

Please advice

CodePudding user response：

Example

data = {'id': {0: 1, 1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2, 7: 2},
        'name': {0: 'SS', 1: 'SS', 2: 'SS', 3: 'SB', 4: 'SE', 5: 'SR', 6: 'SR', 7: 'SS'}}
df = pd.DataFrame(data)

Code

df['next_name'] = df.drop_duplicates(keep='last').groupby('id')['name'].shift(-1)
df['next_name'] = df.groupby('id')['next_name'].bfill()

df

    id  name    next_name
0   1   SS      SB
1   1   SS      SB
2   1   SS      SB
3   1   SB      NaN
4   2   SE      SR
5   2   SR      SS
6   2   SR      SS
7   2   SS      NaN

if your data has several groups with the same name in id like below, you need to change code a bit.

id  name
1   SS
1   SS
1   SB
1   SB
1   SS
1   SS

CodePudding user response：

Use this code and try it

import numpy as np
import pandas as pd

raw_data = {'id':[1 , 1 , 1, 1 , 2, 2, 2, 2],'name': 
['SS','SS','SS','SB','SE','SR','SR','SS']}

df = pd.DataFrame(raw_data, columns = ['id', 'name'])

next_name = []

for row in df['name']:
if row == 'SS':
    next_name.append('SB')
elif row == 'SE':
    next_name.append('SR')
elif row == 'SR':
    next_name.append('SS')
else :
    next_name.append(None)
    
next_name[-1] = None


df['next_name'] = next_name

print(df)

CodePudding user response：

This should work:

def lookahead(s):
    i,c = pd.factorize(s)
    return pd.Series(i).map(dict(enumerate(c[1:]))).tolist()

df.groupby('id')['name'].transform(lookahead)

Output:

0     SB
1     SB
2     SB
3    NaN
4     SR
5     SS
6     SS
7    NaN