I am looking for an algorithm which helps me to distribute people into 3 groups (a, b , c). The people in a group should fit together, which means that the food preferences should match in a way that they all can agree to the same kind of food. Each cluster (sub-group) within the group consists of 6 people. Let's say there are 4 types of food preferences:
- The person likes to eat meat
- The person likes to eat vegetarian food
- The person likes to eat vegan food
- The person has no food preferences, which means the person basically likes to eat everything
I want to distribute the people into 3 logical groups:
- Group a: meat and no_food_preference
- Group b: vegan, vegetarian and no_food_preference
- Group c: vegetarian and no_food_preference
I use the people with no_food_preference to fill up the cluster in order to make sure that each cluster contains 6 people.
After distributing all people into groups, each group will consist of multiple of 6 people.
My problem: I tried very hard, but I can not find an algorithm which does this for me. I find it very hard to handle the fact, that the algorithm should handle any number of participants.
Example:
import pandas as pd
df = pd.DataFrame(
{
"user_id": [i for i in range(1, 55)],
"Master_FoodPreference": ["meat", "vegetarian", "meat", "vegan", "meat", "vegetarian", "meat", "vegetarian", "no_food_preference",
"meat",'no_food_preference', 'vegetarian',"meat", "meat",
"vegetarian", "vegetarian", "vegan", "vegetarian", "vegetarian", "no_food_preference", "vegan",
"vegetarian", "vegetarian", "vegetarian", "vegetarian", "vegetarian", "vegetarian",
"meat", "vegetarian", "meat", "vegetarian", "no_food_preference", "vegetarian", "vegetarian", "vegetarian",
"vegetarian", "vegetarian", "vegetarian", "vegetarian", "vegetarian", "no_food_preference",
"no_food_preference", "no_food_preference", "meat", "no_food_preference", "meat", "meat",
"vegan", "no_food_preference", "no_food_preference", "vegan" ,"no_food_preference" ,"vegan" ,"vegan" ]
}
)
df.head()
>>>>
user_id Master_FoodPreference
0 1 meat
1 2 vegetarian
2 3 meat
3 4 vegan
4 5 meat
How would you group these people into group_a
, group_b
and group_c
?
CodePudding user response:
Does not seem too difficult: classify the items in groups, then use items from "no_food_preference" to fill the other groups modulo 6 - and if some items still remain in "no_food_preference" move them in another group:
pref = ["meat", "vegetarian", "meat", "vegan", "meat", "vegetarian", "meat", "vegetarian", "no_food_preference",
"meat",'no_food_preference', 'vegetarian',"meat", "meat",
"vegetarian", "vegetarian", "vegan", "vegetarian", "vegetarian", "no_food_preference", "vegan",
"vegetarian", "vegetarian", "vegetarian", "vegetarian", "vegetarian", "vegetarian",
"meat", "vegetarian", "meat", "vegetarian", "no_food_preference", "vegetarian", "vegetarian", "vegetarian",
"vegetarian", "vegetarian", "vegetarian", "vegetarian", "vegetarian", "no_food_preference",
"no_food_preference", "no_food_preference", "meat", "no_food_preference", "meat", "meat",
"vegan", "no_food_preference", "no_food_preference", "vegan" ,"no_food_preference" ,"vegan" ,"vegan" ]
def assign_groups(pref):
groups={}
for i,p in enumerate(pref):
if p in groups:
groups[p].append(i)
else:
groups[p] = [i]
for p in ['meat','vegetarian','vegan']:
need = len(groups[p]) % 6
if need:
for i in range(6-need):
groups[p].append(groups["no_food_preference"].pop())
if len(groups["no_food_preference"]):
groups["meat"] = groups["no_food_preference"]
del groups["no_food_preference"]
return groups
assign_groups(pref)
{'meat': [0, 2, 4, 6, 9, 12, 13, 27, 29, 43, 45, 46, 8, 10, 19, 31, 40, 41], 'vegetarian': [1, 5, 7, 11, 14, 15, 17, 18, 21, 22, 23, 24, 25, 26, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39], 'vegan': [3, 16, 20, 47, 50, 52, 53, 51, 49, 48, 44, 42]}
This will work if the total number of items is a multiple of 6, of course
CodePudding user response:
Create 4 dataframes: 3 for your groups and 1 for the group with no food preference, then fill each group A, B, C inplace with group X if needed:
dfA = df[df['Master_FoodPreference'].eq('meat')]
dfA = dfA.append(dfX.sample(len(dfA) % 6))
dfB = df[df['Master_FoodPreference'].eq('vegan')
| df['Master_FoodPreference'].eq('vegetarian')]
dfB = dfB.append(dfX.sample(len(dfB) % 6))
dfC = df[df['Master_FoodPreference'].eq('vegetarian')]
dfC = dfC.append(dfX.sample(len(dfC) % 6))
Output:
>>> dfB
user_id Master_FoodPreference
1 2 vegetarian
3 4 vegan
5 6 vegetarian
7 8 vegetarian
11 12 vegetarian
14 15 vegetarian
15 16 vegetarian
16 17 vegan
17 18 vegetarian
18 19 vegetarian
20 21 vegan
21 22 vegetarian
22 23 vegetarian
23 24 vegetarian
24 25 vegetarian
25 26 vegetarian
26 27 vegetarian
28 29 vegetarian
30 31 vegetarian
32 33 vegetarian
33 34 vegetarian
34 35 vegetarian
35 36 vegetarian
36 37 vegetarian
37 38 vegetarian
38 39 vegetarian
39 40 vegetarian
47 48 vegan
50 51 vegan
52 53 vegan
53 54 vegan
49 50 no_food_preference
After distributing all people into groups, each group will consist of multiple of 6 people.
It's possible with your sample:
# Before append
>>> len(dfA), len(dfB), len(dfC), len(dfX)
(12, 31, 24, 11)