Home > Software design >  How to merge a big dataframe with small dataframe?
How to merge a big dataframe with small dataframe?

Time:10-07

I have a big dataframe with 100 rows and the structure is [qtr_dates<datetime.date>, sales<float>] and a small dataframe with same structure with less than 100 rows. I want to merge these two dfs such that merged df will have all the rows from small df and remaining rows will be taken from big df.

Right now I am doing this

df = big_df.merge(small_df, on=big_df.columns.tolist(), how='outer')

But this is creating a df with duplicate qtr_dates.

CodePudding user response:

Use concat with remove duplicates by DataFrame.drop_duplicates:

pd.concat([small_df, big_df], ignore_index=True).drop_duplicates(subset=['qtr_dates'])

CodePudding user response:

If I understand correctly, you want everything from the bigger dataframe, but if that date is present in the smaller data frame you would want it replaced by the relevant value from the smaller one?

Hence I think you want to do this:

df = big_df.merge(small_df, on=big_df.columns.tolist(),how='left',indicator=True)
df = df[df._merge!= "both"]
df_out = pd.concat([df,small_df],ignore_index=True)

This would remove any rows from the big_df which exist in the small_df in the 2nd step, before then adding the small_df rows by concatenating rather than merging. If you had more column names that weren't involved with the join you'd have to do some column renaming/dropping though I think.

Hope that's right.

CodePudding user response:

Try maybe join instead of merge.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html

  • Related