Home > Blockchain >  Join two dataframes on the values present in a specific column in the name_data dataframe using koal
Join two dataframes on the values present in a specific column in the name_data dataframe using koal

Time:02-16

I am trying to join two the dataframes as shown below on the code column values present in the name_data dataframe.

I have two dataframes shown below and I expect to have a resulting dataframe which would only have the rows from the `team_datadataframe where the correspondingcodevalue column is present in thename_data``` dataframe.

I am using koalas for this on databricks and I have the following code using the join operation.

import databricks.koalas as ks

name_data= ks.DataFrame({'code':['123a', '345b', '678c'],
        'id':[1, 2, 3]})

team_data = ks.DataFrame({'code':['123a', '23s', '34a'],
        'id':[1, 2, 3]})

team_data_filtered = team_data.join(name_data.set_index('code'), on='code')

display(team_data_filtered)

The expected output would be to see only the following in team_data_filtered.

Code    id
'123a'   1

But my code is throwing an error stating that columns overlap but no suffix specified: ['id'].

May someone help to resolve this issue?

CodePudding user response:

Try adding suffix parameters:

team_data_filtered = team_data.join(name_data.set_index('code'), on='code', 
                                                lsuffix='_1', rsuffix='_2')
team_data_filtered = team_data_filtered.loc[team_data_filtered.id_1==team_data_filtered.id2]
display(team_data_filtered)   

An then to clean the columns, if desired:

team_data_filtered.rename({'id_1':'id'}, inplace=True, axis=1)

                                                                     
  • Related