I am trying trying to join two dataframes in such a way that the resulting union contains info about both of them. My dataframes are similar to:
>> df_1
user_id hashtag1 hashtag2 hashtag3
0000 '#breakfast' '#lunch' '#dinner'
0001 '#day' '#night' NaN
0002 '#breakfast' NaN NaN
The second dataframe contains a unique identifier of the hashtags and their respective score:
>> df_2
hashtag1 score
'#breakfast' 10
'#lunch' 8
'#dinner' 9
'#day' -5
'#night' 6
I want to add a set of columns on my first dataframe that contain the scores of each hashtag
used, such as:
user_id hashtag1 hashtag2 hashtag3 score1 score2 score3
0000 '#breakfast' '#lunch' '#dinner' 10 8 9
0001 '#day' '#night' NaN -5 6 NaN
0002 '#breakfast' NaN NaN 10 NaN NaN
I tried to use df.join()
but I get an error: "ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat"
My code is as follows:
new_df = df_1.join(df_2, how='left', on='hashtag1')
I appreciate any help, thank you
CodePudding user response:
You should try pandas.merge
:
pandas.merge(df_1, df_2, on='hashtag1', how='left')
If you want to use .join
, you need to set the index of df_2
.
df_1.join(df_2.set_index('hashtag1'), on='hashtag1', how='left')
Some resources: