how to repeat same values as first row for entire dataframe and append with other dataframes-CodePudding

I have multiple dataframes with different shape (length). I have to combine them as one.

For example,

df1:

version   Col1   Col2
1.0.0     234    456


df2:
ID    Col1_df2   Col2_df2   Col3_df2
1     aaaa       dfgr       1234
2     bbbb       njogn      7890
3     cccc       gmhlj      3456


df3:
Col1_df3    Col2_df3   Col3_df3
qwe         dfg        dffd

Firstly , I would like to look for maximum length dataframe which is df2 here . that many times , records have to be repeated for df1 and df3 . Then combine all of them in one dataframe.

final_df
version   Col1    Col2   ID  Col1_df2   Col2_df2   Col3_df2   Col1_df3   Col2_df3  Col3_df3
1.0.0     234    456     1     aaaa       dfgr       1234     qwe         dfg        dffd
1.0.0     234    456     2     bbbb       njogn      7890     qwe         dfg        dffd
1.0.0     234    456     3     cccc       gmhlj      3456     qwe         dfg        dffd

I am thinking of something like this:

1. look for maximum length of dataframe ,here it is len(df2).
2. loop through other dataframes in this range and repeat the rows .
3. append all the files into one.

I am trying to loop througn using set_index and stack but I am missing something.

Can anyone please help me ?

CodePudding user response：

I was able to accomplish this using pd.merge() and ffill()

df_merge = pd.merge(df1, df2, left_index=True, right_index=True, how = 'outer')
df_merge = pd.merge(df_merge, df3, left_index=True, right_index=True, how = 'outer')
df_merge = df_merge.ffill()
df_merge

CodePudding user response：

Here is my sample code. You can try

df1_extend = pd.concat([df1]*int(df2.shape[0]//df3.shape[0]), ignore_index=True)
df3_extend = pd.concat([df3]*int(df2.shape[0]//df3.shape[0]), ignore_index=True)
final_df = pd.concat((df1_extend, df2, df3_extend), axis=1)

EDIT

if df_1.shape[0]=1 and df_3.shape[0]=1, you can do it faster

final_df = pd.concat((df1, df2, df3), axis=1)
final_df.ffill()

CodePudding user response：

You can try using

final_df = pd.concat([df1,df2,df3],axis=1)

And then doing a ffill like this

final_df.ffill(axis = 0)

That oughta fill the na values and make sure you get equal values for each shaped columns

CodePudding user response：

Here is one approach, which is to introduce a key column in all DF and then merging them together

df1['key'] = 1
df2['key'] = 1
df3['key'] = 1

df1.merge(df2, on='key').merge(df3, on='key')

version Col1 Col2   key     ID  Col1_df2    Col2_df2    Col3_df2    Col1_df3    Col2_df3    Col3_df3
0   1.0.0 234 456   1       1   aaaa       dfgr     1234    qwe     dfg     dffd
1   1.0.0 234 456   1       2   bbbb       njogn    7890    qwe     dfg     dffd
2   1.0.0 234 456   1       3   cccc       gmhlj    3456    qwe     dfg     dffd