I am working on a project, which uses pandas data frame. So in there, I received some values in to the columns as below.
In there, I need to add this Pos_vec column and word_vec column and need to create a new column called the sum_of_arrays. And the size of the third column's array size should 2.
Eg: pos_vec Word_vec sum_of_arrays
[-0.22683072, 0.32770252] [0.3655883, 0.2535131] [0.13875758,0.58121562]
Is there anyone who can help me? I'm stuck in here. :(
CodePudding user response:
If you convert them to np.array
you can simply sum them.
import pandas as pd
import numpy as np
df = pd.DataFrame({'pos_vec':[[-0.22683072,0.32770252],[0.14382899,0.049593687],[-0.24300802,-0.0908088],[-0.2507714,-0.18816864],[0.32294357,0.4486494]],
'word_vec':[[0.3655883,0.2535131],[0.33788466,0.038143277], [-0.047320127,0.28842866],[0.14382899,0.049593687],[-0.24300802,-0.0908088]]})
If you want to use numpy
df['col_sum'] = df[['pos_vec','word_vec']].applymap(lambda x: np.array(x)).sum(1)
If you don't want to use numpy
df['col_sum'] = df.apply(lambda x: [sum(x) for x in zip(x.pos_vec,x.word_vec)], axis=1)
CodePudding user response:
There are maybe cleaner approaches possible using pandas to iterate over the columns, however this is the solution I came up with by extracting the data from the DataFrame as lists:
# Extract data as lists
pos_vec = df["pos_vec"].tolist()
word_vec = df["word_vec"].tolist()
# Create new list with desired calculation
sum_of_arrays = [[x y for x,y in zip(l1, l2)] for l1,l2 in zip(pos,word)]
# Add new list to DataFrame
df["sum_of_arrays"] = sum_of_arrays