Home > Back-end >  Randomly merge two dataframes based on condition in Pandas
Randomly merge two dataframes based on condition in Pandas

Time:12-29

I have two dataframes of same length, with a shared column called post_id, look like this:

df1:

post_id text
001 some text 1
002 some text 2
003 some text 3
... ...
999 some text 999

df2:

post_id text
001 different text 1
002 different text 2
003 different text 3
... ...
999 different text 999

What I want is a new dataframe with half of the rows randomly selected from df1, the other half from df2, with all the post_id still in there and no duplicates. Is there a way to do this short of manually iloc the rows?

CodePudding user response:

If there is same number of columns and same index use DataFrame.update with DataFrame.sample:

df1.update(df2.sample(frac=0.5, replace=False))
print (df1)
   post_id                text
0      1.0    different text 1
1      2.0         some text 2
2      3.0         some text 3
3    999.0  different text 999
  • Related