Home > Software engineering >  Select Random Value from Pandas list column for each row ensuring that value don't get picked a
Select Random Value from Pandas list column for each row ensuring that value don't get picked a

Time:11-20

I have a Pandas DataFrame below

import pandas as pd

df = pd.DataFrame({
    'poc': ["a", "b", "c", "d"],
    'school': ["school1", "school2", "school3", "school4"],
    'volunteers': [["sam", "mat", "ali", "mike", "guy", "john"],
                   ["sam", "mat", "ali", "mike"],
                   ["rose", "sam", "mike", "jorge"],
                   ["susan", "jack", "alex", "mat", "mike"]]
})
poc school volunteers
a school1 ['sam', 'mat', 'ali', 'mike', 'guy', 'john']
b school2 ['sam', 'mat', 'ali', 'mike']
c school3 ['rose', 'sam', 'mike', 'jorge']
d school4 ['susan', 'jack', 'alex', 'mat', 'mike']

I need to create a new column that has a random pick from the volunteers column to select 1 volunteer for each school ensuring that the same volunteer doesn't get picked twice.

So far I have tried:

import random

df["random_match"] = [random.choice(x) for x in df["volunteers"]]

but this just gives me a random volunteer without ensuring it is not repeated.

CodePudding user response:

This should work. Just accumulate what you have seen so far and remove it from the set of available choices. I am assuming a default value of NAN if nothing fits.

df["random_match"] = pd.NA
already_picked = set()
for row_idx in range(len(df)):
    available_group = set(df.iloc[row_idx]["volunteers"]) - already_picked
    if len(available_group) > 0:
        chosen_name = random.sample(available_group, 1)[0]
        df.loc[row_idx, 'random_match'] = chosen_name
        already_picked.add(chosen_name)

CodePudding user response:

You could try this:

selected = []
for i, list_of_volunteers in enumerate(df["volunteers"].values):
    shuffle(list_of_volunteers)
    for volunteer in list_of_volunteers:
        if volunteer in df.loc[i, "volunteers"] and volunteer not in selected:
            df.loc[i, "pick"] = volunteer
            selected.append(volunteer)
            break

print(df)
# Outputs
  poc   school                        volunteers  pick
0   a  school1  [mat, ali, mike, john, sam, guy]   mat
1   b  school2             [mike, sam, ali, mat]  mike
2   c  school3          [sam, mike, rose, jorge]   sam
3   d  school4    [mike, jack, alex, mat, susan]  jack
  • Related