correct way to filter for the same key on 2 dicts but return the combination of the looped k:v optio-CodePudding

Essentially, I'm looking for the "correct/pythatonic" way of matching 2 diferent dictionaries by the same key_value but still give me acess to all of the keys of the matched dictionaries.

# It all starts with a single json file which has 3 lists of dicts, I want to parse 2 of them.

Brands_json_file = {
    cars=[], # Notice its a list of dicts 
    sellers=[], # Notice its a list of dicts 
    prices=[], # Notice its a list of dicts 
    database_name: "someDB",
    database_id: "does not matter"
}


cars = [
    {
        name: str = "bmw",
        id: str = "xxxxxxxx"  # Even though its 2 seperate dicts i can associate both
        doors: int = 4,       # because the id is the same
        options = [],
    },
    {
        name: str = "fiat",
        id: str = "yyyyy",
        doors: int = 2,
        options = [],  # theres even more nested stuff 
    },
]

sellers = [
    {
        name: str = "Some place name Lda",
        id: str = "xxxxxxxx",      # in this example this seller is the "seller of the BMW car"
        distance: int = 300
    },
    {
        name: str = "Another location",
        id: str = "yyyyy",
        distance: int = 200
        km: int = 100 # dicts are not the same lenghts.

    }
]

So what i have been doing succefully is something like:

# I just loop over what i want after json.loads
brands_file = json.loads(......)
for car in brands_file['cars']:
  # i want to grab some car info
  car_name = car['name']
  car_doors = ...
  car_engine = ... 
  for seller in brands_file['sellers']:
    if car['id'] == seller['id']:
      seller_name= ...
      seller_id= ...

      # logic is done, i just keep grabing info from the seller and i save everything for later use

There has to be a better way right? It just feels wrong having to loop over BOTH dictionaries a million times.

CodePudding user response：

If I understand your question correctly, you're asking how to make an "inner join" of the two hashes on their id columns. If the data were in SQL tables, this would be a 1-liner.

Two loops are fine logically and also in practice if the tables are small. You correctly observe that they're wordy and also slow if the tables are big.

Just as a relational database would join efficiently with the benefit of indexes, you can do the same here.

An index is just a "reverse map" from the indexed column (id) to the containing object.

I'm not a wizard at python idioms, but this can look something like:

cars_ix = dict((row['id'], row) for row in brands_file['cars'])
sellers_ix =  dict((row['id'], row) for row in brands_file['sellers'])
for id in set(cars_ix.keys()) & set(sellers_ix.keys()):
  car = cars_ix[id]
  seller = sellers_ix[id]
  # ... process matched car and seller here

As often happens, you're trading space and time for index creation for better asymptotic time performance when processing.

I'm sure there's a library to simplify this, and Pandas is probably it.