Home > front end >  Merge two columns from multiple panda series dataframes based on string matching from two columns wi
Merge two columns from multiple panda series dataframes based on string matching from two columns wi

Time:06-17

I need to merge two columns from a pandas series dataframe together on the last 4 digits of the first column pack_number. I currently have 2 dataframes with a different number of columns.

enter image description here

enter image description here

So far, I thought about extracting the last 4 digits of the ROOT_VIN but I'm not sure how to proceed with column matching and merging these tables together.

Ideally, I'd like to merge the dataframe including pack_number (which includes 6 other columns with different values) into the dataframe including ROOT_VIN which also has other columns with different lengths. The goal is to include the pack_number and match it with its ROOT_VIN.

After merging, i now have 108k rows (matching the 108k rows for df2) and many of the rows are duplicated because df1 only had 451 rows. enter image description here

CodePudding user response:

If I understand right then you have 2 data frames with a lot of columns.

# df1 # pack_number Table
# df2 # ROOT_VIN Table

# @BeRT2me solution
df2['pack_number'] = df2['ROOT_VIN'].str[-4:]

# joining the 2 dataframes
# we add '_remove' to the duplicate columns names
df3 = df1.merge(df2, how='inner', on='pack_number', suffixes=('', '_remove'))

# remove the columns that end with '_remove'
df3.drop(
    [col_name for col_name in df_3.columns if '_remove' in col_name],
    axis=1, inplace=True
)
  • Related