Home > database >  pandas merge dataframes with same columns and reference its id
pandas merge dataframes with same columns and reference its id

Time:05-26

I have a problem with python pandas. I have serveral different dataframes which I want to split up into an SQLite Database. My first Dataframe to_country:

Dataframe 1

to_country = df[['country']]
to_country = to_country.rename(columns={'country': 'Name'})
to_country = to_country.drop_duplicates()
#Add Index to Country Dataframe
to_country.insert(0, 'Id', range(1, 1   len(to_country)))
to_country=to_country.reindex(columns=['Id','Name'])
to_country = to_country.set_index('Id')
to_country.to_sql('Country', con=con, if_exists='append', index=True)```

This part works fine. Now i have another Dataframe to_state which looks like that:

Dataframe 2

to_state = df[['state','country']]
to_state = to_state.rename(columns={'state': 'Name'})
to_state = to_state.drop_duplicates()
to_state.insert(0, 'Id', range(1, 1   len(to_state)))
to_state=to_state.reindex(columns=['Id','Name','country'])
to_state = to_state.set_index('Id')

Now I want to replace the Country USA with the Id from the previous Dataframe, i want it to look like that:

Note the CountryId should be the attribute Id from the dataframe to_country Id___Name___CountryId

1_____CA_________1

I tried following Statement but which only resulted in:

to_state = pd.merge(to_state, to_country, left_on='country', right_on="Name")

Result

I really do not know how should i solve this. what is even more irritating, I don't know why the Colums Id from both Dataframes disappear.

CodePudding user response:

As I don't have your example dataframe, test this

import pandas as pd

to_country = pd.DataFrame({"id": [20,30],
                           "country": ['USA','CHINA']})
to_state = pd.DataFrame({"id": [90,80],
                           "state": ['CA','AB'],
                         "country": ['USA','CHINA']})
print(f'__________ORIGINALS DATAFRAMES__________ \n##STATE##\n{to_state}\n\n###COUNTRY###\n{to_country}')

def func(line):
    t = 0
    for x in range(0, len(to_country['country'])):
        t = to_country.loc[to_state['country'] == line['country']]
    t = t.values.tolist()
    return t[0][0]

print(f'\n_________FINAL DATAFRAME__________\n')
to_state['ID_NEW_country'] = to_state.apply(func, axis = 1)

print(f' \n{to_state}')

CodePudding user response:

I solved it like that in the end:

#Add Countries to Database
to_country = df[['country']]
to_country = to_country.rename(columns={'country': 'Name'})
to_country = to_country.drop_duplicates()
#Add Index to Country Dataframe
to_country = to_country.reset_index()
to_country = to_country.rename(columns={"index":"ID"})
to_country['ID'] = to_country.index   1
to_country.set_index('ID').to_sql('Country', con=con, if_exists='append', index=True)


#Add States to Database
to_state = df[['state','country']]
to_state = to_state.drop_duplicates()
#Add Index to Country Dataframe
to_state = to_state.reset_index()
to_state = to_state.rename(columns={"index":"ID", 'state': 'Name'})
to_state['ID'] = to_state.index   1
to_state = to_state.merge(to_country, how='left', left_on='country', right_on='Name').drop(['country', 'Name_y'], axis= 1)
print(to_state)
to_state = to_state.rename(columns={'ID_x': 'ID', 'Name_x': 'Name', 'ID_y': 'Country_ID'})
print(to_state)
to_state.set_index('ID').to_sql('State', con=con, if_exists='append', index=True)
  • Related