Home > Mobile >  Unable to merge datasets
Unable to merge datasets

Time:12-06

I have scraped data from two different pharma websites. So, I have 2 datasets in hand:-

First Dataset

Second Dataset

Both datasets have a name column in common. What I am trying to achieve is combining these two datasets. My final objective is to get all the tables from the first dataset and product descriptions from the second dataset wherever the name is the same in both tables.

I tried using information from geeks for geeks:- Merge Code and result

but it's giving me 106 result which was from the older dataset and I need 251 results as in the new_df.

Can anyone suggest what I am doing here?

CodePudding user response:

If you want to keep new_df length, I would suggest to use how='left' argument in

pd.merge(new_df, match_data, on="Name", how="left")

So it will do a left join on new_df.

Based in the screenshots you shared, I would double-check there are names in common in both dataframes "Name" column

CodePudding user response:

Did you try these?

desc_df1 = pd.merge(new_df, match_data, on='Name', how='inner')
desc_df1 = pd.merge(new_df, match_data, on='Name', how='left')

After trying these options let us now, because I could not able to understand from your data preview. Can you sort Name.value_counts() ascending and check is there any dublicates in both df's ?.If so this is why you got this problem

CodePudding user response:

If you want to keep the size of the first dataframe constant, you need to use left join. If there are mismatched values, it will be set to null, but this will keep the size constant.

Also remember that the first parameter of the merge method is the dataframe whose size you want to keep constant when 'how' is 'left'.

  • Related