Let's say I have
DF1=pd.DataFrame([[1,2,3],[2,3,4],[3,4,5]], columns=['a','b','c'])
DF2=pd.DataFrame([[9,8,7]],columns=['x','y','z'])
I want to add all the columns of DF2 to DF1 but I want to repeat the rows of DF2 to fill all the rows of DF1
I can think of 3 ways to do it, none of them seem satisfactory.
With a for loop, one column at a time
for col in ['x','y','z']:
DF1[col]=DF2[col]
Creating a dummy column and then merging the two against that dummy variable
DF1=DF1.assign(dummy=1).merge(DF2.assign(dummy=1)).drop('dummy',axis=1)
which is faster (with my real data) than the for loop but looks really weird.
Creating a new DF2 with rows equal to DF1's rows:
DF1[['x','y','z']]=pd.concat([DF2 for x in range(DF1.shape[0])])
This one errors out saying cannot reindex on an axis with duplicate labels
but it doesn't really matter because just the pd.concat([DF2 for x in range(DF1.shape[0])])
is disqualifying slow.
Is there a syntax that's more like this but that actually works and is better than the merge?
DF1['x','y','z']=DF2
CodePudding user response:
Are you asking for a cross join?
DF1.join(DF2, how='cross')
this is roughly equivalent to pd.concat([DF1,DF2], axis=1).ffill().astype(int)
Output
a b c x y z
0 1 2 3 9 8 7
1 2 3 4 9 8 7
2 3 4 5 9 8 7
CodePudding user response:
What about something like this:
DF1[DF2.columns] = DF2
DF1 = DF1.fillna(method = "ffill")
Or
DF1[DF2.columns] = DF2.loc[[0]*3, :].values
You can (dinamically) change the 3
for the size of DF1
.