I have 2 pandas dataframes as below:
df1:
id name1
123 AAA
345 BBB
df2:
id_1 city address_1 address_2 province country mail
123 SFO - - - - -
234 TO - - - - -
345 NY - - - - -
I am merging df1 with df2 but in the result i only need the "city" column from df2. One way is I could do the merge and drop them as below
df_merged = pd.merge(df1,df2,left_on='id',right_on='id_1',how='left').drop(columns=['address_1',..])
But if the number of columns is more, it gets tedious to specify all those unnecessary columns in drop function. Just wondering is there an efficient way to specify only the columns that are needed from the merge.
Any suggestions would be appreciated.
CodePudding user response:
If all you need is the city column, you could just do:
df_merged = pd.merge(df1,df2,left_on='id',right_on='id_1',how='left')['City']
Of course, if you need more than that, you could add them. Just make sure you add a second second of brackets, as for >1 column you need to pass a list.
CodePudding user response:
You can select the relevant column from df2 during the merge step itself.
df_merged = pd.merge(df1,
df2[[ 'id_1', 'City']],
left_on='id',right_on='id_1',how='left')