I am currently having a dataset as below:
id name date_and_hour
1 SS 1/1/2019 00:12
1 SS 1/1/2019 00:13
1 SS 1/1/2019 00:14
1 SB 1/1/2019 00:15
1 SS 1/1/2019 00:16
2 SE 1/1/2019 01:15
2 SR 1/1/2019 01:16
2 SS 1/1/2019 01:17
2 SS 1/1/2019 01:18
I want the next name with the changed value only based on group ID. Output looks as below
id name date_and_hour next_name
1 SS 1/1/2019 00:12 SB
1 SS 1/1/2019 00:13 SB
1 SS 1/1/2019 00:14 SB
1 SB 1/1/2019 00:15 SS
1 SS 1/1/2019 00:16 null
2 SE 1/1/2019 01:15 SR
2 SR 1/1/2019 01:16 SS
2 SS 1/1/2019 01:17 SR
2 SR 1/1/2019 01:18 null
Please advice
CodePudding user response:
Example
data = {'id': {0: 1, 1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2, 7: 2},
'name': {0: 'SS', 1: 'SS', 2: 'SS', 3: 'SB', 4: 'SE', 5: 'SR', 6: 'SR', 7: 'SS'}}
df = pd.DataFrame(data)
Code
df['next_name'] = df.drop_duplicates(keep='last').groupby('id')['name'].shift(-1)
df['next_name'] = df.groupby('id')['next_name'].bfill()
df
id name next_name
0 1 SS SB
1 1 SS SB
2 1 SS SB
3 1 SB NaN
4 2 SE SR
5 2 SR SS
6 2 SR SS
7 2 SS NaN
if your data has several groups with the same name in id like below, you need to change code a bit.
id name
1 SS
1 SS
1 SB
1 SB
1 SS
1 SS
CodePudding user response:
Use this code and try it
import numpy as np
import pandas as pd
raw_data = {'id':[1 , 1 , 1, 1 , 2, 2, 2, 2],'name':
['SS','SS','SS','SB','SE','SR','SR','SS']}
df = pd.DataFrame(raw_data, columns = ['id', 'name'])
next_name = []
for row in df['name']:
if row == 'SS':
next_name.append('SB')
elif row == 'SE':
next_name.append('SR')
elif row == 'SR':
next_name.append('SS')
else :
next_name.append(None)
next_name[-1] = None
df['next_name'] = next_name
print(df)
CodePudding user response:
This should work:
def lookahead(s):
i,c = pd.factorize(s)
return pd.Series(i).map(dict(enumerate(c[1:]))).tolist()
df.groupby('id')['name'].transform(lookahead)
Output:
0 SB
1 SB
2 SB
3 NaN
4 SR
5 SS
6 SS
7 NaN