Home > Enterprise >  pandas: combine values in a list of dataframes of different sizes
pandas: combine values in a list of dataframes of different sizes

Time:02-18

I have a dataframe as follows,(but much bigger):

import pandas as pd
panda_list = [pd.DataFrame({'sent_a.1': [0, 3, 2, 1], 'sent_a.2': [0, 1, 4, 0], 'sent_a.3': [0, 6, 0, 8],'sent_a.4': [1, 1, 8, 6],'ID':['id_1_2','id_1_3','id_1_4','id_1_4']}),
              pd.DataFrame({'sent_a.1': [0, 3], 'sent_a.2': [0, 2], 'sent_a.3': [0, 6],'sent_a.4': [1, 1],'ID':['id_2_1','id_2_2']}),
              pd.DataFrame({'sent_a.1': [0, 3, 2, 1], 'sent_a.2': [0, 1, 4, 0], 'sent_a.3': [0, 6, 0, 8], 'sent_a.4': [1, 1, 8, 6], 'ID': ['id_1_2', 'id_1_3', 'id_1_4', 'id_1_4']}),
              ]

I would like to combine the pandas on similar columns into one dataframe, so the output would look like this,

desired output:
print(new_df)

   sent_a.1  sent_a.2  sent_a.3   sent_a.4      ID
0   0,0,0     0,0,0       0,0,0     1,1,1    id_1_2,id_2_1,id_3_1
1   3,3,3     1,2,1       6,6,6     1,1,1    id_1_3,id_2_2,id_3_2
2   2,NaN,2   4,NaN,4     0,NaN,0   8,NaN,8  id_1_4,NaN,id_3_3
3   1,NaN,1   0,NaN,0     8,NaN,8   6,NaN,6  id_1_4,NaN,id_3_4

So I have tried the following for example for the first columns but since the sizes are not the same, it send an error,

new_df=pd.DataFrame()

new_df['sent_a.1']=panda_list[0]['sent_a.1']   ','   panda_list[1]['sent_a.1']  ','  panda_list[2]['sent_a.1']

Note:

I do not want to concatenate or merge the columns though, i want to combine them. please look at the desired output.

CodePudding user response:

IIUC, you can concat and aggregate (assuming here as list).

dropping NaNs:
out = pd.concat(panda_list).groupby(level=0).agg(list)

output:

    sent_a.1   sent_a.2   sent_a.3   sent_a.4                        ID
0  [0, 0, 0]  [0, 0, 0]  [0, 0, 0]  [1, 1, 1]  [id_1_2, id_2_1, id_1_2]
1  [3, 3, 3]  [1, 2, 1]  [6, 6, 6]  [1, 1, 1]  [id_1_3, id_2_2, id_1_3]
2     [2, 2]     [4, 4]     [0, 0]     [8, 8]          [id_1_4, id_1_4]
3     [1, 1]     [0, 0]     [8, 8]     [6, 6]          [id_1_4, id_1_4]

To aggregate as string, ensure to convert to string first:

out = pd.concat(panda_list).astype(str).groupby(level=0).agg(','.join)

output:

  sent_a.1 sent_a.2 sent_a.3 sent_a.4                    ID
0    0,0,0    0,0,0    0,0,0    1,1,1  id_1_2,id_2_1,id_1_2
1    3,3,3    1,2,1    6,6,6    1,1,1  id_1_3,id_2_2,id_1_3
2      2,2      4,4      0,0      8,8         id_1_4,id_1_4
3      1,1      0,0      8,8      6,6         id_1_4,id_1_4
ensuring not to lose NaNs

concatenate on the other axis and perform a double transpose

out = (
 pd.concat(panda_list, axis=1).astype(str)
   .T.groupby(level=0, sort=False)
   .agg(','.join).T
)

output:

  sent_a.1 sent_a.2 sent_a.3 sent_a.4                    ID
0  0,0.0,0  0,0.0,0  0,0.0,0  1,1.0,1  id_1_2,id_2_1,id_1_2
1  3,3.0,3  1,2.0,1  6,6.0,6  1,1.0,1  id_1_3,id_2_2,id_1_3
2  2,nan,2  4,nan,4  0,nan,0  8,nan,8     id_1_4,nan,id_1_4
3  1,nan,1  0,nan,0  8,nan,8  6,nan,6     id_1_4,nan,id_1_4
  • Related