Home > OS >  Join values in different dataframes
Join values in different dataframes

Time:05-08

I am trying trying to join two dataframes in such a way that the resulting union contains info about both of them. My dataframes are similar to:

>> df_1
user_id    hashtag1      hashtag2     hashtag3
0000       '#breakfast'  '#lunch'     '#dinner'
0001       '#day'        '#night'     NaN
0002       '#breakfast'  NaN          NaN

The second dataframe contains a unique identifier of the hashtags and their respective score:

>> df_2
hashtag1       score
'#breakfast'  10
'#lunch'      8
'#dinner'     9
'#day'        -5
'#night'      6

I want to add a set of columns on my first dataframe that contain the scores of each hashtag used, such as:

user_id    hashtag1      hashtag2     hashtag3    score1    score2    score3
0000       '#breakfast'  '#lunch'     '#dinner'   10        8         9
0001       '#day'        '#night'     NaN         -5        6         NaN
0002       '#breakfast'  NaN          NaN         10        NaN       NaN

I tried to use df.join() but I get an error: "ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat"

My code is as follows: new_df = df_1.join(df_2, how='left', on='hashtag1')

I appreciate any help, thank you

CodePudding user response:

You should try pandas.merge:

pandas.merge(df_1, df_2, on='hashtag1', how='left')

If you want to use .join, you need to set the index of df_2.

df_1.join(df_2.set_index('hashtag1'), on='hashtag1', how='left')

Some resources:

  • Related