How to join two pandas dataframe such that the second table repeats-CodePudding

I have tow dataframe df1 & df2:

df1 = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    },
  )

df1

    A   B   C   D
0   A0  B0  C0  D0
1   A1  B1  C1  D1
2   A2  B2  C2  D2
3   A3  B3  C3  D3

df2

df2 = pd.DataFrame(
    {
        "E": ["A4"],
        "F": ["B4"],
        "G": ["C4"],
        
    },
)

df2

       E    F   G
    0   A4  B4  C4

When I do a join only the first row of df2 is populated as it is single row and rest are given as NULL

output = df1.join(df2)

    A   B   C   D   E   F   G
0   A0  B0  C0  D0  A4  B4  C4
1   A1  B1  C1  D1  NaN NaN NaN
2   A2  B2  C2  D2  NaN NaN NaN
3   A3  B3  C3  D3  NaN NaN NaN

Is there a way to join such that instead of NaN the row repeats in the rest of rows as below

Desired Output

    A   B   C   D   E   F   G
0   A0  B0  C0  D0  A4  B4  C4
1   A1  B1  C1  D1  A4  B4  C4
2   A2  B2  C2  D2  A4  B4  C4
3   A3  B3  C3  D3  A4  B4  C4

CodePudding user response：

First idea is only forward filling missing values, not working if some missing values in df1 - also replace them:

output = df1.join(df2).ffill()
print (output)
    A   B   C   D   E   F   G
0  A0  B0  C0  D0  A4  B4  C4
1  A1  B1  C1  D1  A4  B4  C4
2  A2  B2  C2  D2  A4  B4  C4
3  A3  B3  C3  D3  A4  B4  C4

Or use DataFrame.assign with unpack Series created by first column of Dataframe - disadvatage is it failed if integers columns names in df2:

output = df1.assign(**df2.iloc[0])

Or use DataFrame.reindex for same index in both DataFrames, also are repeated values by method='ffill' parameter:

output = df1.join(df2.reindex(df1.index, method='ffill'))

CodePudding user response：

Adding to the answer from jezrael, you can also use the entire single row of df2 as an input to fillna, and it will apply to the appropriate columns:

output = df1.join(df2).fillna(df2.iloc[0])

CodePudding user response：

Use merge, with how = 'cross':

 df1.merge(df2,how = 'cross')

    A   B   C   D   E   F   G
0  A0  B0  C0  D0  A4  B4  C4
1  A1  B1  C1  D1  A4  B4  C4
2  A2  B2  C2  D2  A4  B4  C4
3  A3  B3  C3  D3  A4  B4  C4

If your data is fairly large, you could use expand_grid from pyjanitor for more performance:

# pip install pyjanitor
import pandas as pd
import janitor

(df1
.expand_grid(
    df_key = 'df1', 
    others = {'df2':df2})
.droplevel(axis = 1 , level = 0)
)
    A   B   C   D   E   F   G
0  A0  B0  C0  D0  A4  B4  C4
1  A1  B1  C1  D1  A4  B4  C4
2  A2  B2  C2  D2  A4  B4  C4
3  A3  B3  C3  D3  A4  B4  C4

Do keep in mind that this is a Cartesian product, so be careful about memory consumption