I have this list, which contains lists of dictionaries:
my_list = [
[{'id': '1D1', 'name_id': 'Ethan', 'manager': 'John', 'employee_details': None},
{'id': '1D2', 'name_id': 'Kevin', 'manager': 'Helen',
'employee_details': "(EmployeeDetail(id='1D2.0', partner='Jenny', children = ['Eva']),)"},
{'id': '1D3', 'name_id': 'Richard', 'manager': 'Robert',
'employee_details': "(EmployeeDetail(id='1D3.0', partner='Roberta', children= ['Noah', 'Elvis']),)"}
],
[{'id': '1D4', 'name_id': 'Liam', 'manager': 'John', 'employee_details': None},
{'id': '1D5', 'name_id': 'William', 'manager': 'Benjamin',
'employee_details': "(EmployeeDetail(id='1D5.0', partner='Emma', children = ['Amelia']),)"}
]
]
Expected output, list of arrays:
[array([['1D1', 'Ethan', 'John', None],
['1D2', 'Kevin', 'Helen',
"(EmployeeDetail(id='1D2.0', partner='Jenny', children = ['Eva']),)"],
['1D3', 'Richard', 'Robert',
"(EmployeeDetail(id='1D3.0', partner='Roberta', children= ['Noah', 'Elvis']),)"]],
dtype=object),
array([['1D4', 'Liam', 'John', None],
['1D5', 'William', 'Benjamin',
"(EmployeeDetail(id='1D5.0', partner='Emma', children = ['Amelia']),)"]
I can achieve the expected output with the following code:
import pandas as pd
import numpy as np
my_arrays = [np.array(pd.DataFrame(l)) for l in my_list.__iter__()]
%timeit my_arrays = [np.array(pd.DataFrame(l)) for l in my_list.__iter__()]
881 µs ± 24.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
My problem is that I would like to do this in a more efficient way, because my actual data have thousands of nested lists and it requires a big amount of time to run. Is there any better way to achieve this result? Thank you very much!
CodePudding user response:
I don't think it gets much faster than a nested list comprehension. pandas
isn't necessary here.
result = [np.array([list(d.values()) for d in sublist]) for sublist in my_list]
This assumes all your dictionaries share the same ordering (e.g. 'id'
is always first).
CodePudding user response:
Just use nested list comprehension:
out = [[np.array(list(d.values())) for d in lst] for lst in my_list]
On my machine,
%timeit -n 100000 [[np.array(list(d.values())) for d in lst] for lst in my_list]
16.4 µs ± 896 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit -n 100000 [np.array(pd.DataFrame(l)) for l in my_list.__iter__()]
1.1 ms ± 247 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)