Assign numpy matrix to pandas columns-CodePudding

I have dataframe with 48870 rows and calculated embeddings with shape (48870, 768)

I wanna assign this embeddings to padnas column When i try

test['original_text_embeddings'] = embeddings

I have an error: Wrong number of items passed 768, placement implies 1 I know if a make something like df.loc['original_text_embeddings'] = embeddings[0] will work but i need to automate this process

CodePudding user response：

Your embeddings have 768 columns, which would translate to equally 768 columns in a data frame. You are trying to assign all columns from the embeddings to just one column in the data frame, which is not possible.

What you could do is generating a new data frame from the embeddings and concatenate the test df with the embedding df

embedding_df = pd.DataFrame(embeddings)

test = pd.concat([test, embedding_df], axis=1)

Have a look at the documentation for handling indexes and concatenating on different axis: https://pandas.pydata.org/docs/reference/api/pandas.concat.html

CodePudding user response：

A dataframe/column needs a 1d list/array:

In [84]: x = np.arange(12).reshape(3,4)
In [85]: pd.Series(x)
...
ValueError: Data must be 1-dimensional

Splitting the array into a list (of arrays):

In [86]: pd.Series(list(x))
Out[86]: 
0      [0, 1, 2, 3]
1      [4, 5, 6, 7]
2    [8, 9, 10, 11]
dtype: object
In [87]: _.to_numpy()
Out[87]: 
array([array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8,  9, 10, 11])],
      dtype=object)