I have tow dataframe df1 & df2:
df1 = pd.DataFrame(
{
"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"],
},
)
df1
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
df2
df2 = pd.DataFrame(
{
"E": ["A4"],
"F": ["B4"],
"G": ["C4"],
},
)
df2
E F G
0 A4 B4 C4
When I do a join only the first row of df2 is populated as it is single row and rest are given as NULL
output = df1.join(df2)
A B C D E F G
0 A0 B0 C0 D0 A4 B4 C4
1 A1 B1 C1 D1 NaN NaN NaN
2 A2 B2 C2 D2 NaN NaN NaN
3 A3 B3 C3 D3 NaN NaN NaN
Is there a way to join such that instead of NaN the row repeats in the rest of rows as below
Desired Output
A B C D E F G
0 A0 B0 C0 D0 A4 B4 C4
1 A1 B1 C1 D1 A4 B4 C4
2 A2 B2 C2 D2 A4 B4 C4
3 A3 B3 C3 D3 A4 B4 C4
CodePudding user response:
First idea is only forward filling missing values, not working if some missing values in df1
- also replace them:
output = df1.join(df2).ffill()
print (output)
A B C D E F G
0 A0 B0 C0 D0 A4 B4 C4
1 A1 B1 C1 D1 A4 B4 C4
2 A2 B2 C2 D2 A4 B4 C4
3 A3 B3 C3 D3 A4 B4 C4
Or use DataFrame.assign
with unpack Series
created by first column of Dataframe - disadvatage is it failed if integers columns names in df2
:
output = df1.assign(**df2.iloc[0])
Or use DataFrame.reindex
for same index in both DataFrame
s, also are repeated values by method='ffill'
parameter:
output = df1.join(df2.reindex(df1.index, method='ffill'))
CodePudding user response:
Adding to the answer from jezrael, you can also use the entire single row of df2 as an input to fillna, and it will apply to the appropriate columns:
output = df1.join(df2).fillna(df2.iloc[0])
CodePudding user response:
Use merge
, with how = 'cross'
:
df1.merge(df2,how = 'cross')
A B C D E F G
0 A0 B0 C0 D0 A4 B4 C4
1 A1 B1 C1 D1 A4 B4 C4
2 A2 B2 C2 D2 A4 B4 C4
3 A3 B3 C3 D3 A4 B4 C4
If your data is fairly large, you could use expand_grid from pyjanitor for more performance:
# pip install pyjanitor
import pandas as pd
import janitor
(df1
.expand_grid(
df_key = 'df1',
others = {'df2':df2})
.droplevel(axis = 1 , level = 0)
)
A B C D E F G
0 A0 B0 C0 D0 A4 B4 C4
1 A1 B1 C1 D1 A4 B4 C4
2 A2 B2 C2 D2 A4 B4 C4
3 A3 B3 C3 D3 A4 B4 C4
Do keep in mind that this is a Cartesian product, so be careful about memory consumption