Home > Mobile >  Numpy 2D Matrix showing list type for nested matrix elements
Numpy 2D Matrix showing list type for nested matrix elements

Time:04-01

Currently I am trying to pass a 2D matrix into the sklearn OneHotEncoder. Whenever I try to pass the matrix I get this error:

Encoders require their input to be uniformly strings or numbers. Got ['list']

After a bit of investigation, I see the matrix being returned is showing:

[list(['e2', 'e4', 'e5']) list(['e1', 'e2', 'e3', 'e4'])
 list(['e1', 'e2']) list(['e1', 'e2', 'e3', 'e4', 'e5'])
 list(['e1', 'e2', 'e3', 'e4', 'e5'])
 list(['e1', 'e2', 'e3', 'e4', 'e5', 'e6'])]

As you can see instead of just being a 2D matrix, I see the outer array is correct but the inner array encapsulates the arrays with list(). I was wondering how to fix this.

Below is the code I am trying to get the list of IDS column from the pandas dataframe

arr = np.asarray(result['IDS'], dtype=object)

CodePudding user response:

Using a copy-n-paste from your question:

In [239]: [list(['e2', 'e4', 'e5']), list(['e1', 'e2', 'e3', 'e4']),
     ...:  list(['e1', 'e2']), list(['e1', 'e2', 'e3', 'e4', 'e5']),
     ...:  list(['e1', 'e2', 'e3', 'e4', 'e5']),
     ...:  list(['e1', 'e2', 'e3', 'e4', 'e5', 'e6'])]
Out[239]: 
[['e2', 'e4', 'e5'],
 ['e1', 'e2', 'e3', 'e4'],
 ['e1', 'e2'],
 ['e1', 'e2', 'e3', 'e4', 'e5'],
 ['e1', 'e2', 'e3', 'e4', 'e5'],
 ['e1', 'e2', 'e3', 'e4', 'e5', 'e6']]
In [240]: np.array(_)
<ipython-input-240-7a2cd91c32ca>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  np.array(_)
Out[240]: 
array([list(['e2', 'e4', 'e5']), list(['e1', 'e2', 'e3', 'e4']),
       list(['e1', 'e2']), list(['e1', 'e2', 'e3', 'e4', 'e5']),
       list(['e1', 'e2', 'e3', 'e4', 'e5']),
       list(['e1', 'e2', 'e3', 'e4', 'e5', 'e6'])], dtype=object)

I assume you used the object dtype because you got this 'ragged' warning:

np.asarray(result['IDS'], dtype=object)

And I assume result['IDS'] looks a lot like Out[239], a list of lists that vary in length. Or rather result is a dataframe, and this is a Series, a column of the dataframe.

You might want to show result or result['IDS']. I can guess what it looks like.

What kind of 2d array were you expecting? With component lists that vary from 2 to 6 elements, there's no way you can make a 2d array!

Making a Series:

In [243]: S = pd.Series(Out[239])
In [244]: S
Out[244]: 
0                [e2, e4, e5]
1            [e1, e2, e3, e4]
2                    [e1, e2]
3        [e1, e2, e3, e4, e5]
4        [e1, e2, e3, e4, e5]
5    [e1, e2, e3, e4, e5, e6]
  • Related