Home > Blockchain >  ValueError: Setting an array element with sequence
ValueError: Setting an array element with sequence

Time:03-22

i am trying to make an np.array from my dataframe column which contains float arrays of different lengths example of column:

[0.123123, 0.123123]
[1.123112]
[0.123123, 0.123123, 0.123123, 0.123123]

and i am getting ValueError: Setting an array element with sequence

i tried:

np.array(df['vector'].tolist())
np.array(df['vector'].squeeze())
np.array(df['vector'].tolist(), dtype=object)

and they all lead to ValueError

pandas version 0.23.4

CodePudding user response:

If you want to concatenate all of nested elements in you data frame column to a single 1D array, you can use np.hstack.

import numpy as np
import pandas as pd

df = pd.DataFrame({'vector': [
    [0.123123, 0.123123],
    [1.123112],
    [0.123123, 0.123123, 0.123123, 0.123123]
    ]}
)

np.hstack(df['vector'])
# returns:
array([0.123123, 0.123123, 1.123112, 0.123123, 0.123123, 0.123123, 0.123123])

CodePudding user response:

You can use numpy.pad for making all list same length then convert to numpy array like below:

>>> df = pd.DataFrame({'vector':[[0.123123, 0.123123], [1.123112], [0.123123, 0.123123, 0.123123, 0.123123]]})
>>> df
                                     vector
0                      [0.123123, 0.123123]
1                                [1.123112]
2  [0.123123, 0.123123, 0.123123, 0.123123]

>>> max_len = df.vector.apply(lambda x: len(x)).max()
>>> df.vector = df.vector.apply(lambda x : np.pad(np.array(x), (0,max_len-len(x)), 'constant', constant_values=(0)))
>>> df
                                     vector
0            [0.123123, 0.123123, 0.0, 0.0]
1                 [1.123112, 0.0, 0.0, 0.0]
2  [0.123123, 0.123123, 0.123123, 0.123123]

>>> res = np.array(df['vector'].tolist())
>>> res
array([[0.123123, 0.123123, 0.      , 0.      ],
       [1.123112, 0.      , 0.      , 0.      ],
       [0.123123, 0.123123, 0.123123, 0.123123]])
  • Related