Home > Software design >  How to assign random values from list to new column that doesn't exist in another column of the
How to assign random values from list to new column that doesn't exist in another column of the

Time:03-05

df = pd.DataFrame(data={
    'job_id': ['0001', '0001', '0001', '0002', '0002', '0002', '0003', '0003', '0004', '0004', '0005', '0005', '0005', '0005', '0006'],
    'user_id': ['frank', 'frank', 'frank', 'jessica', 'jessica', 'jessica', 'eric', 'eric', 'james', 'james','josh','josh','josh','josh','sam']
})

   job_id  user_id
0   0001   frank
1   0001   frank
2   0001   frank
3   0002   jessica
4   0002   jessica
5   0002   jessica
6   0003   eric
7   0003   eric
8   0004   james
9   0004   james
10  0005   josh
11  0005   josh
12  0005   josh
13  0005   josh
14  0006   sam

Output:

   job_id  user_id  validator_id
0   0001   frank    jessica
1   0001   frank    jessica
2   0001   frank    jessica
3   0002   jessica  eric
4   0002   jessica  eric
5   0002   jessica  eric
6   0003   eric     james
7   0003   eric     james
8   0004   james    sam
9   0004   james    sam
10  0005   josh     frank
11  0005   josh     frank
12  0005   josh     frank
13  0005   josh     frank
14  0006   sam      josh

The desired output should be above format

CodePudding user response:

IIUC, you could use a set difference combined with random.choice:

import random

users = set(df['user_id'])

df['validator_id'] = (df.groupby('user_id')['user_id']
                       .transform(lambda x: random.choice(list(users.difference(x))))
                     )

example output:

   job_id  user_id validator_id
0    0001    frank          sam
1    0001    frank          sam
2    0001    frank          sam
3    0002  jessica         eric
4    0002  jessica         eric
5    0002  jessica         eric
6    0003     eric        frank
7    0003     eric        frank
8    0004    james         eric
9    0004    james         eric
10   0005     josh        james
11   0005     josh        james
12   0005     josh        james
13   0005     josh        james
14   0006      sam         eric

Note that the same validator can be selected many times, or never


Alternatively

if you want a random rotation:

from random import shuffle
users = list(set(df['user_id']))
shuffle(users)

d = dict(zip(users, users[-1:] users[:-1]))
# {'james': 'frank', 'josh': 'james', 'eric': 'josh',
#  'jessica': 'eric', 'sam': 'jessica', 'frank': 'sam'}

df['validator_id'] = df['user_id'].map(d)

output:

   job_id  user_id validator_id
0    0001    frank          sam
1    0001    frank          sam
2    0001    frank          sam
3    0002  jessica         eric
4    0002  jessica         eric
5    0002  jessica         eric
6    0003     eric         josh
7    0003     eric         josh
8    0004    james        frank
9    0004    james        frank
10   0005     josh        james
11   0005     josh        james
12   0005     josh        james
13   0005     josh        james
14   0006      sam      jessica
  • Related