I try to sum all values from groupped columns with aggregate syntax
df2 = pd.DataFrame([[1, np.array([2, 5, 3])],
[1, np.array([2, 5, 3])],
[1, np.array([2, 5, 3, 5, 3])]],
columns=['doc_id', 'topic_dist']
when I execute the code to aggregate array with different Shapes
def getsumcolumns(dfsource):
grouped = dfsource.groupby('doc_id')
aggregate = list((k, v["topic_dist"].sum()) for k, v in grouped)
df_results = pd.DataFrame(aggregate, columns=['doc_id','topic_dist'])
print(df_results)
return df_results
I have error messge
operands could not be broadcast together with shapes (3,) (5,)
expected values
doc_id topic_dist
0 1 [6, 15, 9, 5, 3]
any ideas to get the sum of this columns?
CodePudding user response:
You need to split the Series
of lists into a two-dimensional array / dataframe type before applying the sum
for k, v in df2.groupby('doc_id'):
print(k, v['topic_dist'].apply(pd.Series).sum().to_list(), sep='\t')
Output
1 [6.0, 15.0, 9.0, 5.0, 3.0]