I have two dataframes of same length, with a shared column called post_id
, look like this:
df1
:
post_id | text |
---|---|
001 | some text 1 |
002 | some text 2 |
003 | some text 3 |
... | ... |
999 | some text 999 |
df2
:
post_id | text |
---|---|
001 | different text 1 |
002 | different text 2 |
003 | different text 3 |
... | ... |
999 | different text 999 |
What I want is a new dataframe with half of the rows randomly selected from df1
, the other half from df2
, with all the post_id
still in there and no duplicates. Is there a way to do this short of manually iloc
the rows?
CodePudding user response:
If there is same number of columns and same index use DataFrame.update
with DataFrame.sample
:
df1.update(df2.sample(frac=0.5, replace=False))
print (df1)
post_id text
0 1.0 different text 1
1 2.0 some text 2
2 3.0 some text 3
3 999.0 different text 999