I have a very large table (below) which im looking to split up in order to reduce load on my DB. Ideally I would take a wide table and create multiple small tables. Im using python 3 and pandas!
Name | Age | City |
---|---|---|
Tom | 20 | London |
Josh | 18 | Paris |
Mike | 20 | Paris |
From the table above Im looking to create these 3 tables
Name | Age | City |
---|---|---|
Tom | 1 | 1 |
Josh | 2 | 2 |
Mike | 1 | 2 |
ID | Age |
---|---|
1 | 20 |
2 | 18 |
ID | City |
---|---|
1 | London |
2 | Paris |
Thanks in advance!
CodePudding user response:
This is actually pretty simple. For each column, create a mapping of unique items to the unique items' indexes.
cols = ['Age', 'City']
mappings = {col: dict(zip(df[col].unique(), np.arange(df[col].nunique()) 1)) for col in cols}
inverted_mappings = {col: {v: k for k, v in mp.items()} for col, mp in mappings.items()}
Now, use .replace
to apply the lookup dictionaries to the dataframe:
>>> df
Name Age City
0 Tom 20 London
1 Josh 18 Paris
2 Mike 20 Paris
>>> new_df = df.replace(mappings)
>>> new_df
Name Age City
0 Tom 1 1
1 Josh 2 2
2 Mike 1 2
>>> new_df.replace(inverted_mappings)
Name Age City
0 Tom 20 London
1 Josh 18 Paris
2 Mike 20 Paris
CodePudding user response:
you can use Categorical, here is the example:
df[['Age','City']] = df[['Age','City']].astype('category')
df = df.assign(Age_id=df.Age.cat.codes 1, City_id=df.City.cat.codes 1)
Name Age City Age_id City_id
0 Tom 20 London 2 1
1 Josh 18 Paris 1 2
2 Mike 20 Paris 2 2
age_id = df[['Age_id','Age']].drop_duplicates().set_index('Age_id').sort_index()
Age
Age_id
1 18
2 20
city_id = df[['City_id','City']].drop_duplicates().set_index('City_id').sort_index()
City
City_id
1 London
2 Paris