I have a Pandas DataFrame below
import pandas as pd
df = pd.DataFrame({
'poc': ["a", "b", "c", "d"],
'school': ["school1", "school2", "school3", "school4"],
'volunteers': [["sam", "mat", "ali", "mike", "guy", "john"],
["sam", "mat", "ali", "mike"],
["rose", "sam", "mike", "jorge"],
["susan", "jack", "alex", "mat", "mike"]]
})
poc | school | volunteers |
---|---|---|
a | school1 | ['sam', 'mat', 'ali', 'mike', 'guy', 'john'] |
b | school2 | ['sam', 'mat', 'ali', 'mike'] |
c | school3 | ['rose', 'sam', 'mike', 'jorge'] |
d | school4 | ['susan', 'jack', 'alex', 'mat', 'mike'] |
I need to create a new column that has a random pick from the volunteers column to select 1 volunteer for each school ensuring that the same volunteer doesn't get picked twice.
So far I have tried:
import random
df["random_match"] = [random.choice(x) for x in df["volunteers"]]
but this just gives me a random volunteer without ensuring it is not repeated.
CodePudding user response:
This should work. Just accumulate what you have seen so far and remove it from the set of available choices. I am assuming a default value of NAN if nothing fits.
df["random_match"] = pd.NA
already_picked = set()
for row_idx in range(len(df)):
available_group = set(df.iloc[row_idx]["volunteers"]) - already_picked
if len(available_group) > 0:
chosen_name = random.sample(available_group, 1)[0]
df.loc[row_idx, 'random_match'] = chosen_name
already_picked.add(chosen_name)
CodePudding user response:
You could try this:
selected = []
for i, list_of_volunteers in enumerate(df["volunteers"].values):
shuffle(list_of_volunteers)
for volunteer in list_of_volunteers:
if volunteer in df.loc[i, "volunteers"] and volunteer not in selected:
df.loc[i, "pick"] = volunteer
selected.append(volunteer)
break
print(df)
# Outputs
poc school volunteers pick
0 a school1 [mat, ali, mike, john, sam, guy] mat
1 b school2 [mike, sam, ali, mat] mike
2 c school3 [sam, mike, rose, jorge] sam
3 d school4 [mike, jack, alex, mat, susan] jack