Python: fastest method to fetch a record from a 2D array by search value?-CodePudding

I have a 2D array (mln rows), not sorted

[{"ID: "A123", "Data_A": "2.123", "Data_B": "abcd"}, ..., {"ID: "BI451", "Data_A": "2.123", "Data_B": "dcbz"}]

Need to find a row by "ID" and return it for processing. It is a 1-time action = download the array, find ~100 rows, discard the array.

What is the quickest method? Will converting it to a numpy or pandas.df make it quicker?

CodePudding user response：

You can use filter, which returns a generator:

a = [{"ID": "A123", "Data_A": "2.123", "Data_B": "abcd"}, {"ID": "BI451", "Data_A": "2.123", "Data_B": "dcbz"}]

gen_A123 = filter(lambda d: d['ID'] == 'A123', a)

Thus, you can then only take the number of elements that you want (provided you have enough):

from itertools import islice
list(islice(gen_A123, 100))

## OR
for item in islice(gen_A123, 100):
    # perform action

CodePudding user response：

Try this: (this is O(n))

>>> lst_dct = [{"ID": "A123", "Data_A": "2.123", "Data_B": "abcd"},{"ID": "BI451", "Data_A": "2.123", "Data_B": "dcbz"}]

>>> dct_id = {ld['ID'] : ld for ld in lst_dct}

>>> dct_id
{'A123': {'ID': 'A123', 'Data_A': '2.123', 'Data_B': 'abcd'},
 'BI451': {'ID': 'BI451', 'Data_A': '2.123', 'Data_B': 'dcbz'}}

>>> dct_id['A123']
{'ID': 'A123', 'Data_A': '2.123', 'Data_B': 'abcd'}