Essentially, I'm looking for the "correct/pythatonic" way of matching 2 diferent dictionaries by the same key_value but still give me acess to all of the keys of the matched dictionaries.
# It all starts with a single json file which has 3 lists of dicts, I want to parse 2 of them.
Brands_json_file = {
cars=[], # Notice its a list of dicts
sellers=[], # Notice its a list of dicts
prices=[], # Notice its a list of dicts
database_name: "someDB",
database_id: "does not matter"
}
cars = [
{
name: str = "bmw",
id: str = "xxxxxxxx" # Even though its 2 seperate dicts i can associate both
doors: int = 4, # because the id is the same
options = [],
},
{
name: str = "fiat",
id: str = "yyyyy",
doors: int = 2,
options = [], # theres even more nested stuff
},
]
sellers = [
{
name: str = "Some place name Lda",
id: str = "xxxxxxxx", # in this example this seller is the "seller of the BMW car"
distance: int = 300
},
{
name: str = "Another location",
id: str = "yyyyy",
distance: int = 200
km: int = 100 # dicts are not the same lenghts.
}
]
So what i have been doing succefully is something like:
# I just loop over what i want after json.loads
brands_file = json.loads(......)
for car in brands_file['cars']:
# i want to grab some car info
car_name = car['name']
car_doors = ...
car_engine = ...
for seller in brands_file['sellers']:
if car['id'] == seller['id']:
seller_name= ...
seller_id= ...
# logic is done, i just keep grabing info from the seller and i save everything for later use
There has to be a better way right? It just feels wrong having to loop over BOTH dictionaries a million times.
CodePudding user response:
If I understand your question correctly, you're asking how to make an "inner join" of the two hashes on their id
columns. If the data were in SQL tables, this would be a 1-liner.
Two loops are fine logically and also in practice if the tables are small. You correctly observe that they're wordy and also slow if the tables are big.
Just as a relational database would join efficiently with the benefit of indexes, you can do the same here.
An index is just a "reverse map" from the indexed column (id
) to the containing object.
I'm not a wizard at python idioms, but this can look something like:
cars_ix = dict((row['id'], row) for row in brands_file['cars'])
sellers_ix = dict((row['id'], row) for row in brands_file['sellers'])
for id in set(cars_ix.keys()) & set(sellers_ix.keys()):
car = cars_ix[id]
seller = sellers_ix[id]
# ... process matched car and seller here
As often happens, you're trading space and time for index creation for better asymptotic time performance when processing.
I'm sure there's a library to simplify this, and Pandas is probably it.