I have dataframe with 48870 rows
and calculated embeddings with shape (48870, 768)
I wanna assign this embeddings to padnas column When i try
test['original_text_embeddings'] = embeddings
I have an error: Wrong number of items passed 768, placement implies 1
I know if a make something like df.loc['original_text_embeddings'] = embeddings[0] will work but i need to automate this process
CodePudding user response:
Your embeddings have 768 columns, which would translate to equally 768 columns in a data frame. You are trying to assign all columns from the embeddings to just one column in the data frame, which is not possible.
What you could do is generating a new data frame from the embeddings and concatenate the test df with the embedding df
embedding_df = pd.DataFrame(embeddings)
test = pd.concat([test, embedding_df], axis=1)
Have a look at the documentation for handling indexes and concatenating on different axis: https://pandas.pydata.org/docs/reference/api/pandas.concat.html
CodePudding user response:
A dataframe/column needs a 1d list/array:
In [84]: x = np.arange(12).reshape(3,4)
In [85]: pd.Series(x)
...
ValueError: Data must be 1-dimensional
Splitting the array into a list (of arrays):
In [86]: pd.Series(list(x))
Out[86]:
0 [0, 1, 2, 3]
1 [4, 5, 6, 7]
2 [8, 9, 10, 11]
dtype: object
In [87]: _.to_numpy()
Out[87]:
array([array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8, 9, 10, 11])],
dtype=object)