converting list of lists into 1-D numpy array of lists-CodePudding

I have a list of lists (of variable len) that needs to be converted into a numpy array. Example:

import numpy as np

sample_list = [["hello", "world"], ["foo"], ["alpha", "beta", "gamma"], []]
sample_arr = np.asarray(sample_list)

>>> sample_arr
array([list(['hello', 'world']), list(['foo']),
       list(['alpha', 'beta', 'gamma']), list([])], dtype=object)

>>> sample_arr.shape
(4,)

In the above example, I got a single-dimensional array which is desired. The downstream modules of the code expect the same. However when the lists have the same length, it outputting a 2-dimensional array resulting in error in downstream modules of my code:

sample_list = [["hello"], ["world"], ["foo"], ["bar"]]
sample_arr = np.asarray(sample_list)

>>>
>>> sample_arr
array([['hello'],
       ['world'],
       ['foo'],
       ['bar']], dtype='<U5')
>>> sample_arr.shape
(4, 1)

Instead, I wanted the output similar to the first example:

>>> sample_arr
array([list(['hello']), list(['world']),
       list(['foo']), list(['bar'])], dtype=object)

Is there any way I can achieve that?

CodePudding user response：

You can turn it into a jagged-list by adding a dummy last sub-list, then slice it away:

sample_list = [["hello"], ["world"], ["foo"], ["bar"]]
sample_arr = np.asarray(sample_list [[]])[:-1]

Output:

array([list(['hello']), list(['world']),
       list(['foo']), list(['bar'])], dtype=object)

I hope someone finds a lass hacky solution :)

CodePudding user response：

A quick and dirty Pythonic approach you can use a list comprehension :

sample_arr = np.asarray([[j] for sub in sample_list for j in sub])

A little more info on list comprehensions if you're interested: https://www.w3schools.com/python/python_lists_comprehension.asp

CodePudding user response：

Yes, it's possible! You can define a function that converts the list of lists into a single list that contains all items as follows.

import numpy as np
def flatten_list(nested_list):
    single_list = []
    for item in nested_list:
        single_list.extend(item)
    return single_list

sample_arr = np.asarray(flatten_list([["hello", "world"], ["foo"], ["alpha", "beta", "gamma"], []]))
print(sample_arr)

CodePudding user response：

In your first case, np.array gives us a warning (in new enough numpy versions). That should tell us something - using np.array to make ragged arrays is not ideal. np.array is meant to create regular multidimensional arrays, with numeric (or string) dtypes. Creating an object dtype array like this a fallback option.

In [96]: sample_list = [["hello", "world"], ["foo"], ["alpha", "beta", "gamma"], []]
In [97]: arr = np.array(sample_list)
<ipython-input-97-ec7d58f98892>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  arr = np.array(sample_list)
In [98]: arr
Out[98]: 
array([list(['hello', 'world']), list(['foo']),
       list(['alpha', 'beta', 'gamma']), list([])], dtype=object)

In many ways such an array is a debased list, not a true array.

In the second case it can work as intended (by the developers, if not you!):

In [99]: sample_list = [["hello"], ["world"], ["foo"], ["bar"]]
In [100]: arr = np.array(sample_list)
In [101]: arr
Out[101]: 
array([['hello'],
       ['world'],
       ['foo'],
       ['bar']], dtype='<U5')

To work around that, I recommend making an object dtype array of the right size, and populating it from the list:

In [102]: arr = np.empty(len(sample_list), object)
In [103]: arr
Out[103]: array([None, None, None, None], dtype=object)
In [104]: arr[:] = sample_list
In [105]: arr
Out[105]: 
array([list(['hello']), list(['world']), list(['foo']), list(['bar'])],
      dtype=object)