I have a list of lists (of variable len) that needs to be converted into a numpy array. Example:
import numpy as np
sample_list = [["hello", "world"], ["foo"], ["alpha", "beta", "gamma"], []]
sample_arr = np.asarray(sample_list)
>>> sample_arr
array([list(['hello', 'world']), list(['foo']),
list(['alpha', 'beta', 'gamma']), list([])], dtype=object)
>>> sample_arr.shape
(4,)
In the above example, I got a single-dimensional array which is desired. The downstream modules of the code expect the same. However when the lists have the same length, it outputting a 2-dimensional array resulting in error in downstream modules of my code:
sample_list = [["hello"], ["world"], ["foo"], ["bar"]]
sample_arr = np.asarray(sample_list)
>>>
>>> sample_arr
array([['hello'],
['world'],
['foo'],
['bar']], dtype='<U5')
>>> sample_arr.shape
(4, 1)
Instead, I wanted the output similar to the first example:
>>> sample_arr
array([list(['hello']), list(['world']),
list(['foo']), list(['bar'])], dtype=object)
Is there any way I can achieve that?
CodePudding user response:
You can turn it into a jagged-list by adding a dummy last sub-list, then slice it away:
sample_list = [["hello"], ["world"], ["foo"], ["bar"]]
sample_arr = np.asarray(sample_list [[]])[:-1]
Output:
array([list(['hello']), list(['world']),
list(['foo']), list(['bar'])], dtype=object)
I hope someone finds a lass hacky solution :)
CodePudding user response:
A quick and dirty Pythonic approach you can use a list comprehension :
sample_arr = np.asarray([[j] for sub in sample_list for j in sub])
A little more info on list comprehensions if you're interested: https://www.w3schools.com/python/python_lists_comprehension.asp
CodePudding user response:
Yes, it's possible! You can define a function that converts the list of lists into a single list that contains all items as follows.
import numpy as np
def flatten_list(nested_list):
single_list = []
for item in nested_list:
single_list.extend(item)
return single_list
sample_arr = np.asarray(flatten_list([["hello", "world"], ["foo"], ["alpha", "beta", "gamma"], []]))
print(sample_arr)
CodePudding user response:
In your first case, np.array
gives us a warning (in new enough numpy versions). That should tell us something - using np.array
to make ragged arrays is not ideal. np.array
is meant to create regular multidimensional arrays, with numeric (or string) dtypes. Creating an object dtype array like this a fallback option.
In [96]: sample_list = [["hello", "world"], ["foo"], ["alpha", "beta", "gamma"], []]
In [97]: arr = np.array(sample_list)
<ipython-input-97-ec7d58f98892>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
arr = np.array(sample_list)
In [98]: arr
Out[98]:
array([list(['hello', 'world']), list(['foo']),
list(['alpha', 'beta', 'gamma']), list([])], dtype=object)
In many ways such an array is a debased list, not a true array.
In the second case it can work as intended (by the developers, if not you!):
In [99]: sample_list = [["hello"], ["world"], ["foo"], ["bar"]]
In [100]: arr = np.array(sample_list)
In [101]: arr
Out[101]:
array([['hello'],
['world'],
['foo'],
['bar']], dtype='<U5')
To work around that, I recommend making an object dtype array of the right size, and populating it from the list:
In [102]: arr = np.empty(len(sample_list), object)
In [103]: arr
Out[103]: array([None, None, None, None], dtype=object)
In [104]: arr[:] = sample_list
In [105]: arr
Out[105]:
array([list(['hello']), list(['world']), list(['foo']), list(['bar'])],
dtype=object)