Home > OS >  sum aggregate dataframe columns with different Shapes
sum aggregate dataframe columns with different Shapes

Time:06-18

I try to sum all values from groupped columns with aggregate syntax

df2 = pd.DataFrame([[1,  np.array([2, 5, 3])],
                   [1,  np.array([2, 5, 3])],
                   [1, np.array([2, 5, 3, 5, 3])]],
                 columns=['doc_id', 'topic_dist']

when I execute the code to aggregate array with different Shapes

def getsumcolumns(dfsource):
    grouped = dfsource.groupby('doc_id')
    aggregate = list((k, v["topic_dist"].sum()) for k, v in grouped) 
    df_results = pd.DataFrame(aggregate, columns=['doc_id','topic_dist'])
    print(df_results)
    return df_results 

I have error messge

operands could not be broadcast together with shapes (3,) (5,) 

expected values

 doc_id  topic_dist
0       1  [6, 15, 9, 5, 3]

any ideas to get the sum of this columns?

CodePudding user response:

You need to split the Series of lists into a two-dimensional array / dataframe type before applying the sum

for k, v in df2.groupby('doc_id'):
    print(k, v['topic_dist'].apply(pd.Series).sum().to_list(), sep='\t')

Output

1       [6.0, 15.0, 9.0, 5.0, 3.0]
  • Related