Home > other >  appending arrays stored in dicts into a single dict in python
appending arrays stored in dicts into a single dict in python

Time:11-12

I have an array of dicts with the same keys, and for each key I have an array of values, just as the example bellow:

dict_arr = np.array([{'x': np.array([1,2,3]), 'y': np.array([1,4,9])}, {'x': np.array([4,5,6]), 'y': np.array([16,25,64])}])

What I need to do is to merge everything in a single dict, but by merge I mean I have to join the values of the arrays from the same key. An expected output for the example would be:

{'x': array([1., 2., 3., 4., 5., 6.]), 'y': array([ 1.,  4.,  9., 16., 25., 64.])}

The code I did was the following:

new_dict = {'x': np.array([]), 'y': np.array([])}
for dict_ in dict_arr:
    for key, value in dict_.items():
        new_dict[key] = np.append(new_dict[key], value)
print(new_dict)

and it gives me the expected output, but I wonder if there is a smarter way of doing this where I just merge the 'x' from all dicts instead of iterating over each one in every key and value and updating the value of my new dict.

Note: my real array of dicts is generated by a data acquisition board that can't acquire more than 1024 data per channel, so every dict on the list, except by the last one, has from one to three arrays with 1024 floats values each, and the number of dicts I have is typically in the order of 400 thousand.

CodePudding user response:

List append is faster than repeated np.append:

In [44]: dict_arr = np.array([{'x': np.array([1,2,3]), 'y': np.array([1,4,9])}, {'x': np.array([4,5,6]), 'y': np
    ...: .array([16,25,64])}])
In [45]: new_dict={'x':[], 'y':[]}
In [46]: for dict in dict_arr:
    ...:     for key,value in dict.items():
    ...:         new_dict[key].append(value)
    ...: 
In [47]: new_dict
Out[47]: 
{'x': [array([1, 2, 3]), array([4, 5, 6])],
 'y': [array([1, 4, 9]), array([16, 25, 64])]}
In [48]: newer = {key:np.hstack(value) for key,value in new_dict.items()}
In [49]: newer
Out[49]: {'x': array([1, 2, 3, 4, 5, 6]), 'y': array([ 1,  4,  9, 16, 25, 64])}

defaultdict can streamline this kind of dictionary build:

In [55]: from collections import defaultdict
In [56]: dd = defaultdict(list)
In [58]: for dict in dict_arr:
    ...:     for k,v in dict.items():
    ...:         dd[k].append(v)
    ...: 
In [59]: dd
Out[59]: 
defaultdict(list,
            {'x': [array([1, 2, 3]), array([4, 5, 6])],
             'y': [array([1, 4, 9]), array([16, 25, 64])]})
In [60]: newer = {key:np.hstack(value) for key,value in dd.items()}
In [61]: newer
Out[61]: {'x': array([1, 2, 3, 4, 5, 6]), 'y': array([ 1,  4,  9, 16, 25, 64])}

Since keys are the same, we can use values:

In [54]: list(zip(*[list(dict.values()) for dict in dict_arr]))
Out[54]: [(array([1, 2, 3]), array([4, 5, 6])), (array([1, 4, 9]), array([16, 25, 64]))]

and considating:

In [63]: [np.hstack(i) for i in zip(*[list(dict.values()) for dict in dict_arr])]
Out[63]: [array([1, 2, 3, 4, 5, 6]), array([ 1,  4,  9, 16, 25, 64])]

This still needs to put back into dict form

In [67]: {k:v for k,v in zip(dict_arr[0], Out[63])}
Out[67]: {'x': array([1, 2, 3, 4, 5, 6]), 'y': array([ 1,  4,  9, 16, 25, 64])}

CodePudding user response:

I'm not 100% sure if this will be more performant for your use-case (in fact, there's a good chance it'll be slower since we're still looping...), but I recommend trying and timing it on your full data:

import pandas as pd
import numpy as np

dict_arr = np.array([{'x': np.array([1,2,3]), 'y': np.array([1,4,9])}, {'x': np.array([4,5,6]), 'y': np.array([16,25,64])}])

df = pd.DataFrame.from_records(dict_arr)
df.apply(np.hstack, result_type='reduce').to_dict()

# {'x': array([1, 2, 3, 4, 5, 6]), 'y': array([ 1,  4,  9, 16, 25, 64])}
  • Related