Let's say I have a dictionary
{'us':
{'male':
{'given_names':
['Alex', 'Bob', 'Charlie']
},
'female':
{'given_names':
['Alice', 'Betty', 'Claire']
}
},
'uk':
{'male':
{'given_names':
['aaa', 'Bbb', 'cc']
},
'female':
{'given_names':
['ppp', 'ddd', 'sss']
}
}
}
Now let's say I want to get 60% US names, 40% UK names, but with 50 50 % males and females names.
How Can I do it?
Current approach? Trying to think something similar to this But I guess it is more complex then that.
I was thinking to get all the names first, then applying a distribution from them? But it is not making some logical sense. Can someone help?
# all_possible_names = [
# name
# for list_of_names in [
# self.library[area][gender][
# "given_names"
# ]
# for gender in self.genders
# for area in self.name_areas
# ]
# for name in list_of_names
# ]
# print(all_possible_names) `
Thanks.
CodePudding user response:
Use random.choices
with a weight and choice
to split between male/female, assuming your dictionary is named d
and N
is the total amount of names you'd like, then:
from random import choice, choices
N = 3
names = [
choice(d[country][choice(['male', 'female'])]['given_names'])
for country in choices(['us', 'uk'], weights=[0.6, 0.4])
for _ in range(N)
]
CodePudding user response:
You can use numpy's random.choice to do the weight distribution
from numpy.random import choice as npchoice
from random import choice
some_dict = {
"us": {
"male": {"given_names": ["Alex", "Bob", "Charlie"]},
"female": {"given_names": ["Alice", "Betty", "Claire"]},
},
"uk": {
"male": {"given_names": ["aaa", "Bbb", "cc"]},
"female": {"given_names": ["ppp", "ddd", "sss"]},
},
}
possible_choices = ["us", "uk"]
probability_distribution = [0.6, 0.4]
number_of_items_to_pick = 200
countries = list(
npchoice(possible_choices, number_of_items_to_pick, p=probability_distribution)
)
print(countries)
names = []
females = 0
males = 0
for country in countries:
gender = choice(["male", "female"])
if gender == "female":
females = 1
else:
males = 1
name = choice(some_dict[country][gender]["given_names"])
names.append(name)
print(f"{country} | {gender:.1} | {name}")
print(f"\nF: {females} | M: {males}")
print(f"US: {countries.count('us')} | UK: {countries.count('uk')}")
I added some logic above for my testing, and to check the distribution.
It can be shortened to the logic below:
from numpy.random import choice as npchoice
from random import choice
names = [
choice(some_dict[country][choice(["male", "female"])]["given_names"])
for country in npchoice(["us", "uk"], 200, p=[0.6, 0.4])
]