Home > Enterprise >  How to concatenate two dataframes with duplicates some values?
How to concatenate two dataframes with duplicates some values?

Time:07-29

I have two dataframes of unequal lengths. I want to combine them with a condition.

If two rows of df1 are identical then they must share the same value of df2.(without changing order )

import pandas as pd
d = {'country': ['France', 'France','Japan','China', 'China','Canada','Canada','India']}
df1 = pd.DataFrame(data=d)
I={'conc': [0.30, 0.25, 0.21, 0.37, 0.15]}
df2 = pd.DataFrame(data=I)
dfc=pd.concat([df1,df2], axis=1)

my output 
    country conc
0   France  0.30
1   France  0.25
2   Japan   0.21
3   China   0.37
4   China   0.15
5   Canada  NaN
6   Canada  NaN
7   India   NaN


expected output 
    country conc
0   France  0.30
1   France  0.30
2   Japan   0.25
3   China   0.21
4   China   0.21
5   Canada  0.37
6   Canada  0.37
7   India   0.15

CodePudding user response:

You need to create a link between the values and the countries first.

df2["country"] = df1["country"].unique()

Then you can use it to merge it with your original dataframe.

pd.merge(df1, df2, on="country")

But be aware that this only works as long as the number of the values is identical to the number of countries and the order for them is as expected.

CodePudding user response:

I'd construct the dataframe directly, without intermediate dfs.

d = {'country': ['France', 'France','Japan','China', 'China','Canada','Canada','India']}
I = {'conc': [0.30, 0.25, 0.21, 0.37, 0.15]}
c = 'country'

dfc = pd.DataFrame(I, index=pd.Index(pd.unique(d[c]), name=c)).reindex(d[c]).reset_index() 
  • Related