df = pd.DataFrame(data={
'job_id': ['0001', '0001', '0001', '0002', '0002', '0002', '0003', '0003', '0004', '0004', '0005', '0005', '0005', '0005', '0006'],
'user_id': ['frank', 'frank', 'frank', 'jessica', 'jessica', 'jessica', 'eric', 'eric', 'james', 'james','josh','josh','josh','josh','sam']
})
job_id user_id
0 0001 frank
1 0001 frank
2 0001 frank
3 0002 jessica
4 0002 jessica
5 0002 jessica
6 0003 eric
7 0003 eric
8 0004 james
9 0004 james
10 0005 josh
11 0005 josh
12 0005 josh
13 0005 josh
14 0006 sam
Output:
job_id user_id validator_id
0 0001 frank jessica
1 0001 frank jessica
2 0001 frank jessica
3 0002 jessica eric
4 0002 jessica eric
5 0002 jessica eric
6 0003 eric james
7 0003 eric james
8 0004 james sam
9 0004 james sam
10 0005 josh frank
11 0005 josh frank
12 0005 josh frank
13 0005 josh frank
14 0006 sam josh
The desired output should be above format
CodePudding user response:
IIUC, you could use a set
difference combined with random.choice
:
import random
users = set(df['user_id'])
df['validator_id'] = (df.groupby('user_id')['user_id']
.transform(lambda x: random.choice(list(users.difference(x))))
)
example output:
job_id user_id validator_id
0 0001 frank sam
1 0001 frank sam
2 0001 frank sam
3 0002 jessica eric
4 0002 jessica eric
5 0002 jessica eric
6 0003 eric frank
7 0003 eric frank
8 0004 james eric
9 0004 james eric
10 0005 josh james
11 0005 josh james
12 0005 josh james
13 0005 josh james
14 0006 sam eric
Note that the same validator can be selected many times, or never
Alternatively
if you want a random rotation:
from random import shuffle
users = list(set(df['user_id']))
shuffle(users)
d = dict(zip(users, users[-1:] users[:-1]))
# {'james': 'frank', 'josh': 'james', 'eric': 'josh',
# 'jessica': 'eric', 'sam': 'jessica', 'frank': 'sam'}
df['validator_id'] = df['user_id'].map(d)
output:
job_id user_id validator_id
0 0001 frank sam
1 0001 frank sam
2 0001 frank sam
3 0002 jessica eric
4 0002 jessica eric
5 0002 jessica eric
6 0003 eric josh
7 0003 eric josh
8 0004 james frank
9 0004 james frank
10 0005 josh james
11 0005 josh james
12 0005 josh james
13 0005 josh james
14 0006 sam jessica