I have a 2D array (mln rows), not sorted
[{"ID: "A123", "Data_A": "2.123", "Data_B": "abcd"}, ..., {"ID: "BI451", "Data_A": "2.123", "Data_B": "dcbz"}]
Need to find a row by "ID" and return it for processing. It is a 1-time action = download the array, find ~100 rows, discard the array.
What is the quickest method? Will converting it to a numpy or pandas.df make it quicker?
CodePudding user response:
You can use filter
, which returns a generator:
a = [{"ID": "A123", "Data_A": "2.123", "Data_B": "abcd"}, {"ID": "BI451", "Data_A": "2.123", "Data_B": "dcbz"}]
gen_A123 = filter(lambda d: d['ID'] == 'A123', a)
Thus, you can then only take the number of elements that you want (provided you have enough):
from itertools import islice
list(islice(gen_A123, 100))
## OR
for item in islice(gen_A123, 100):
# perform action
CodePudding user response:
Try this: (this is O(n))
>>> lst_dct = [{"ID": "A123", "Data_A": "2.123", "Data_B": "abcd"},{"ID": "BI451", "Data_A": "2.123", "Data_B": "dcbz"}]
>>> dct_id = {ld['ID'] : ld for ld in lst_dct}
>>> dct_id
{'A123': {'ID': 'A123', 'Data_A': '2.123', 'Data_B': 'abcd'},
'BI451': {'ID': 'BI451', 'Data_A': '2.123', 'Data_B': 'dcbz'}}
>>> dct_id['A123']
{'ID': 'A123', 'Data_A': '2.123', 'Data_B': 'abcd'}