I have multiple dataframes with different shape (length). I have to combine them as one.
For example,
df1:
version Col1 Col2
1.0.0 234 456
df2:
ID Col1_df2 Col2_df2 Col3_df2
1 aaaa dfgr 1234
2 bbbb njogn 7890
3 cccc gmhlj 3456
df3:
Col1_df3 Col2_df3 Col3_df3
qwe dfg dffd
Firstly , I would like to look for maximum length dataframe which is df2 here . that many times , records have to be repeated for df1 and df3 . Then combine all of them in one dataframe.
final_df
version Col1 Col2 ID Col1_df2 Col2_df2 Col3_df2 Col1_df3 Col2_df3 Col3_df3
1.0.0 234 456 1 aaaa dfgr 1234 qwe dfg dffd
1.0.0 234 456 2 bbbb njogn 7890 qwe dfg dffd
1.0.0 234 456 3 cccc gmhlj 3456 qwe dfg dffd
I am thinking of something like this:
1. look for maximum length of dataframe ,here it is len(df2).
2. loop through other dataframes in this range and repeat the rows .
3. append all the files into one.
I am trying to loop througn using set_index and stack but I am missing something.
Can anyone please help me ?
CodePudding user response:
I was able to accomplish this using pd.merge()
and ffill()
df_merge = pd.merge(df1, df2, left_index=True, right_index=True, how = 'outer')
df_merge = pd.merge(df_merge, df3, left_index=True, right_index=True, how = 'outer')
df_merge = df_merge.ffill()
df_merge
CodePudding user response:
Here is my sample code. You can try
df1_extend = pd.concat([df1]*int(df2.shape[0]//df3.shape[0]), ignore_index=True)
df3_extend = pd.concat([df3]*int(df2.shape[0]//df3.shape[0]), ignore_index=True)
final_df = pd.concat((df1_extend, df2, df3_extend), axis=1)
EDIT
if df_1.shape[0]=1
and df_3.shape[0]=1
, you can do it faster
final_df = pd.concat((df1, df2, df3), axis=1)
final_df.ffill()
CodePudding user response:
You can try using
final_df = pd.concat([df1,df2,df3],axis=1)
And then doing a ffill like this
final_df.ffill(axis = 0)
That oughta fill the na values and make sure you get equal values for each shaped columns
CodePudding user response:
Here is one approach, which is to introduce a key column in all DF and then merging them together
df1['key'] = 1
df2['key'] = 1
df3['key'] = 1
df1.merge(df2, on='key').merge(df3, on='key')
version Col1 Col2 key ID Col1_df2 Col2_df2 Col3_df2 Col1_df3 Col2_df3 Col3_df3
0 1.0.0 234 456 1 1 aaaa dfgr 1234 qwe dfg dffd
1 1.0.0 234 456 1 2 bbbb njogn 7890 qwe dfg dffd
2 1.0.0 234 456 1 3 cccc gmhlj 3456 qwe dfg dffd