Create relationship lookup tables using pandas-CodePudding

I have a very large table (below) which im looking to split up in order to reduce load on my DB. Ideally I would take a wide table and create multiple small tables. Im using python 3 and pandas!

Name	Age	City
Tom	20	London
Josh	18	Paris
Mike	20	Paris

From the table above Im looking to create these 3 tables

Name	Age	City
Tom	1	1
Josh	2	2
Mike	1	2

ID	Age
1	20
2	18

ID	City
1	London
2	Paris

Thanks in advance!

CodePudding user response：

This is actually pretty simple. For each column, create a mapping of unique items to the unique items' indexes.

cols = ['Age', 'City']
mappings = {col: dict(zip(df[col].unique(), np.arange(df[col].nunique())   1)) for col in cols}
inverted_mappings = {col: {v: k for k, v in mp.items()} for col, mp in mappings.items()}

Now, use .replace to apply the lookup dictionaries to the dataframe:

>>> df
   Name  Age    City
0   Tom   20  London
1  Josh   18   Paris
2  Mike   20   Paris

>>> new_df = df.replace(mappings)
>>> new_df
   Name  Age  City
0   Tom    1     1
1  Josh    2     2
2  Mike    1     2

>>> new_df.replace(inverted_mappings)
   Name  Age    City
0   Tom   20  London
1  Josh   18   Paris
2  Mike   20   Paris

CodePudding user response：

you can use Categorical, here is the example:

df[['Age','City']] = df[['Age','City']].astype('category')
df = df.assign(Age_id=df.Age.cat.codes 1, City_id=df.City.cat.codes 1)

   Name Age    City  Age_id  City_id
0   Tom  20  London       2        1
1  Josh  18   Paris       1        2
2  Mike  20   Paris       2        2

age_id = df[['Age_id','Age']].drop_duplicates().set_index('Age_id').sort_index()

       Age
Age_id    
1       18
2       20

city_id = df[['City_id','City']].drop_duplicates().set_index('City_id').sort_index()

           City
City_id        
1        London
2         Paris