Home > front end >  How to join two pandas dataframe such that the second table repeats
How to join two pandas dataframe such that the second table repeats

Time:04-21

I have tow dataframe df1 & df2:

df1 = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    },
  )

df1

    A   B   C   D
0   A0  B0  C0  D0
1   A1  B1  C1  D1
2   A2  B2  C2  D2
3   A3  B3  C3  D3

df2

df2 = pd.DataFrame(
    {
        "E": ["A4"],
        "F": ["B4"],
        "G": ["C4"],
        
    },
)

df2

       E    F   G
    0   A4  B4  C4

When I do a join only the first row of df2 is populated as it is single row and rest are given as NULL

output = df1.join(df2)

    A   B   C   D   E   F   G
0   A0  B0  C0  D0  A4  B4  C4
1   A1  B1  C1  D1  NaN NaN NaN
2   A2  B2  C2  D2  NaN NaN NaN
3   A3  B3  C3  D3  NaN NaN NaN

Is there a way to join such that instead of NaN the row repeats in the rest of rows as below

Desired Output

    A   B   C   D   E   F   G
0   A0  B0  C0  D0  A4  B4  C4
1   A1  B1  C1  D1  A4  B4  C4
2   A2  B2  C2  D2  A4  B4  C4
3   A3  B3  C3  D3  A4  B4  C4

CodePudding user response:

First idea is only forward filling missing values, not working if some missing values in df1 - also replace them:

output = df1.join(df2).ffill()
print (output)
    A   B   C   D   E   F   G
0  A0  B0  C0  D0  A4  B4  C4
1  A1  B1  C1  D1  A4  B4  C4
2  A2  B2  C2  D2  A4  B4  C4
3  A3  B3  C3  D3  A4  B4  C4

Or use DataFrame.assign with unpack Series created by first column of Dataframe - disadvatage is it failed if integers columns names in df2:

output = df1.assign(**df2.iloc[0])

Or use DataFrame.reindex for same index in both DataFrames, also are repeated values by method='ffill' parameter:

output = df1.join(df2.reindex(df1.index, method='ffill'))

CodePudding user response:

Adding to the answer from jezrael, you can also use the entire single row of df2 as an input to fillna, and it will apply to the appropriate columns:

output = df1.join(df2).fillna(df2.iloc[0])

CodePudding user response:

Use merge, with how = 'cross':

 df1.merge(df2,how = 'cross')

    A   B   C   D   E   F   G
0  A0  B0  C0  D0  A4  B4  C4
1  A1  B1  C1  D1  A4  B4  C4
2  A2  B2  C2  D2  A4  B4  C4
3  A3  B3  C3  D3  A4  B4  C4

If your data is fairly large, you could use expand_grid from pyjanitor for more performance:

# pip install pyjanitor
import pandas as pd
import janitor

(df1
.expand_grid(
    df_key = 'df1', 
    others = {'df2':df2})
.droplevel(axis = 1 , level = 0)
)
    A   B   C   D   E   F   G
0  A0  B0  C0  D0  A4  B4  C4
1  A1  B1  C1  D1  A4  B4  C4
2  A2  B2  C2  D2  A4  B4  C4
3  A3  B3  C3  D3  A4  B4  C4

Do keep in mind that this is a Cartesian product, so be careful about memory consumption

  • Related