Home > Software design >  creating Pandas DataFrame as a cross-product between family x city x member
creating Pandas DataFrame as a cross-product between family x city x member

Time:12-10

sorry if this may seem like a simple question, but I am new to python. I would like to create a DataFrame containing 10 values for family names, 10 values for city of birth and for each pair of family name-city of birth, 3 members of that family, which have the "name" a random string up to 8 characters. How can i create such a DataFrame? I don't really know how to use the same pair of family name-city of birth for more than one value for "member".

CodePudding user response:

There are a few ways to go about this, but here's a simple one that's easy to follow (with 5 values instead of the required 10 but you get the idea) :

import random
import string

import pandas as pd

cities = ["New York", "London", "Paris", "Beijing", "Casablanca"]
names = ["Smith", "Heston", "Dupont", "Torvalds", "Clooney"]

df = pd.DataFrame(
    [
        {
            "city": cities[i],
            "family_name": names[i],
            "first_name": "".join([random.choice(string.ascii_lowercase) for _ in range(8)]),
        }
        for i in range(5)
        for _ in range(3)
    ]
)

print(df)
  • Related