Home > front end >  How to convert the values in a Python Defaultdict to a Numpy array?
How to convert the values in a Python Defaultdict to a Numpy array?

Time:01-04

I want multiple values to belong to the same key, so I used a Python defaultdict to walk around this. However, since now the values in the defaultdict are nested lists, how do I make each element of the nested lists a row of a Numpy ndarray?

Let's say my defaultdict looks like this:

my_dict = defaultdict(list)

*** in some for loop *** 
 my_dict[key].append(value) # key is a string and value is a Numpy array of shape (1,10)
*** end of the for loop ***

I guess the slowest way would be using a nested for loop like:

data = np.empty((0,10),np.uint8)
for i in my_dict:
    for j in my_dict[i]:
        data = np.append(data,j,axis=0)   

is there a faster way to do this?

CodePudding user response:

Instead of using defaultdict(list) use setdefault functionality, this will spare you from the nested list

my_dict = dict()
for key, value in values:
    my_dict[key] = np.append(my_dict.setdefault(key, value), value)

data = np.array(list(my_dict.values()))

CodePudding user response:

You should have provided an example, but I think the following is as general as your code implies.

In [131]: from collections import defaultdict
In [132]: dd = defaultdict(list)
In [133]: dd[1].append(np.ones((1,5),int))
In [134]: dd[2].append(2*np.ones((1,5),int))
In [135]: dd[1].append(3*np.ones((1,5),int))

In [136]: dd
Out[136]: 
defaultdict(list,
            {1: [array([[1, 1, 1, 1, 1]]), array([[3, 3, 3, 3, 3]])],
             2: [array([[2, 2, 2, 2, 2]])]})

Several suggested making array from:

In [137]: list(dd.values())
Out[137]: 
[[array([[1, 1, 1, 1, 1]]), array([[3, 3, 3, 3, 3]])],
 [array([[2, 2, 2, 2, 2]])]]

But with the possibility that there is more than one array in each list, that won't work.

We can flatten the nested lies with something similar to your code, but with a faster list append:

In [140]: alist = []
     ...: for i in dd:
     ...:     for a in dd[i]:
     ...:         alist.append(a)
     ...:         
In [141]: alist
Out[141]: [array([[1, 1, 1, 1, 1]]), array([[3, 3, 3, 3, 3]]), array([[2, 2, 2, 2, 2]])]

We can make a 2d array from this (provided the subarrays match in shape):

In [142]: np.vstack(alist)
Out[142]: 
array([[1, 1, 1, 1, 1],
       [3, 3, 3, 3, 3],
       [2, 2, 2, 2, 2]])

or:

In [144]: np.array(alist).shape
Out[144]: (3, 1, 5)

As a general rule, repeated np.append is inefficient. list append (or a list comprehension) is best when iteration is unavoidable.

Guy's

Trying to recreate the dict with @Guy's suggestion:

In [147]: my_dict = dict()
     ...: key,value=(1,np.ones((1,5),int)); my_dict[key]= np.append(my_dict.setdefault(key, value), value)

I would prefer to use np.hstack here (np.append is misused too often).

In [148]: key,value=(2,2*np.ones((1,5),int)); my_dict[key]= np.append(my_dict.setdefault(key, value), value)    
In [149]: key,value=(1,3*np.ones((1,5),int)); my_dict[key]= np.append(my_dict.setdefault(key, value), value)

In [150]: my_dict
Out[150]: 
{1: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3]),
 2: array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])}

This has duplicated values for some of the additions. And making an array from list(my_dict.values()) is no easier.

We could collect the dict values as arrays, but it's not a simple as with lists. Array doesn't have a simple "empty", and doesn't have an inplace "append".

In [157]: dd = defaultdict(lambda: np.zeros([0,5],int))
In [158]: dd[1]=np.vstack((dd[1],(np.ones((1,5),int))))
In [159]: dd[2]=np.vstack((dd[2],(2*np.ones((1,5),int))))
In [160]: dd[3]=np.vstack((dd[3],(3*np.ones((1,5),int))))

In [161]: dd
Out[161]: 
defaultdict(<function __main__.<lambda>()>,
            {1: array([[1, 1, 1, 1, 1]]),
             2: array([[2, 2, 2, 2, 2]]),
             3: array([[3, 3, 3, 3, 3]])})

In [162]: np.vstack(list(dd.values()))
Out[162]: 
array([[1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3]])

This avoids an iteration after the dict is constructed, but the dict construction is more complex and slower. So I don't think it helps.

  • Related