Home > Back-end >  how to repeat same values as first row for entire dataframe and append with other dataframes
how to repeat same values as first row for entire dataframe and append with other dataframes

Time:06-08

I have multiple dataframes with different shape (length). I have to combine them as one.

For example,

df1:

version   Col1   Col2
1.0.0     234    456


df2:
ID    Col1_df2   Col2_df2   Col3_df2
1     aaaa       dfgr       1234
2     bbbb       njogn      7890
3     cccc       gmhlj      3456


df3:
Col1_df3    Col2_df3   Col3_df3
qwe         dfg        dffd

Firstly , I would like to look for maximum length dataframe which is df2 here . that many times , records have to be repeated for df1 and df3 . Then combine all of them in one dataframe.

final_df
version   Col1    Col2   ID  Col1_df2   Col2_df2   Col3_df2   Col1_df3   Col2_df3  Col3_df3
1.0.0     234    456     1     aaaa       dfgr       1234     qwe         dfg        dffd
1.0.0     234    456     2     bbbb       njogn      7890     qwe         dfg        dffd
1.0.0     234    456     3     cccc       gmhlj      3456     qwe         dfg        dffd

I am thinking of something like this:

1. look for maximum length of dataframe ,here it is len(df2).
2. loop through other dataframes in this range and repeat the rows .
3. append all the files into one. 

I am trying to loop througn using set_index and stack but I am missing something.

Can anyone please help me ?

CodePudding user response:

I was able to accomplish this using pd.merge() and ffill()

df_merge = pd.merge(df1, df2, left_index=True, right_index=True, how = 'outer')
df_merge = pd.merge(df_merge, df3, left_index=True, right_index=True, how = 'outer')
df_merge = df_merge.ffill()
df_merge

CodePudding user response:

Here is my sample code. You can try

df1_extend = pd.concat([df1]*int(df2.shape[0]//df3.shape[0]), ignore_index=True)
df3_extend = pd.concat([df3]*int(df2.shape[0]//df3.shape[0]), ignore_index=True)
final_df = pd.concat((df1_extend, df2, df3_extend), axis=1)

EDIT

if df_1.shape[0]=1 and df_3.shape[0]=1, you can do it faster

final_df = pd.concat((df1, df2, df3), axis=1)
final_df.ffill()

CodePudding user response:

You can try using

final_df = pd.concat([df1,df2,df3],axis=1)

And then doing a ffill like this

final_df.ffill(axis = 0)

That oughta fill the na values and make sure you get equal values for each shaped columns

CodePudding user response:

Here is one approach, which is to introduce a key column in all DF and then merging them together

df1['key'] = 1
df2['key'] = 1
df3['key'] = 1

df1.merge(df2, on='key').merge(df3, on='key')
version Col1 Col2   key     ID  Col1_df2    Col2_df2    Col3_df2    Col1_df3    Col2_df3    Col3_df3
0   1.0.0 234 456   1       1   aaaa       dfgr     1234    qwe     dfg     dffd
1   1.0.0 234 456   1       2   bbbb       njogn    7890    qwe     dfg     dffd
2   1.0.0 234 456   1       3   cccc       gmhlj    3456    qwe     dfg     dffd
  • Related