Picking from Dictionary probablity wise-CodePudding

Let's say I have a dictionary

{'us': 
     {'male': 
            {'given_names': 
                          ['Alex', 'Bob', 'Charlie'] 
            }, 
      'female': 
            {'given_names': 
                          ['Alice', 'Betty', 'Claire'] 
            } 
      },

'uk': 
     {'male': 
            {'given_names': 
                          ['aaa', 'Bbb', 'cc'] 
            }, 
      'female': 
            {'given_names': 
                          ['ppp', 'ddd', 'sss'] 
            } 
      }

}

Now let's say I want to get 60% US names, 40% UK names, but with 50 50 % males and females names.

How Can I do it?

Current approach? Trying to think something similar to this But I guess it is more complex then that.

I was thinking to get all the names first, then applying a distribution from them? But it is not making some logical sense. Can someone help?

        # all_possible_names = [
        #     name
        #     for list_of_names in [
        #         self.library[area][gender][
        #             "given_names"
        #         ]
        #         for gender in self.genders
        #         for area in self.name_areas
        #     ]
        #     for name in list_of_names
        # ]
        # print(all_possible_names) `

Thanks.

CodePudding user response：

Use random.choices with a weight and choice to split between male/female, assuming your dictionary is named d and N is the total amount of names you'd like, then:

from random import choice, choices

N = 3

names = [
    choice(d[country][choice(['male', 'female'])]['given_names'])
    for country in choices(['us', 'uk'], weights=[0.6, 0.4])
    for _ in range(N)
]

CodePudding user response：

You can use numpy's random.choice to do the weight distribution

from numpy.random import choice as npchoice
from random import choice


some_dict = {
    "us": {
        "male": {"given_names": ["Alex", "Bob", "Charlie"]},
        "female": {"given_names": ["Alice", "Betty", "Claire"]},
    },
    "uk": {
        "male": {"given_names": ["aaa", "Bbb", "cc"]},
        "female": {"given_names": ["ppp", "ddd", "sss"]},
    },
}


possible_choices = ["us", "uk"]
probability_distribution = [0.6, 0.4]
number_of_items_to_pick = 200
countries = list(
    npchoice(possible_choices, number_of_items_to_pick, p=probability_distribution)
)
print(countries)


names = []
females = 0
males = 0
for country in countries:
    gender = choice(["male", "female"])
    if gender == "female":
        females  = 1
    else:
        males  = 1
    name = choice(some_dict[country][gender]["given_names"])
    names.append(name)
    print(f"{country} | {gender:.1} | {name}")


print(f"\nF: {females}  | M: {males}")
print(f"US: {countries.count('us')} | UK: {countries.count('uk')}")

I added some logic above for my testing, and to check the distribution.
It can be shortened to the logic below:

from numpy.random import choice as npchoice
from random import choice

names = [
    choice(some_dict[country][choice(["male", "female"])]["given_names"])
    for country in npchoice(["us", "uk"], 200, p=[0.6, 0.4])
]