So I have a dictionary that looks something like the following:
{
"tigj09j32f0j2": {
"car": {
"lead": {
"version": "1.1"
}
},
"bike": {
"lead": {
"version": "2.2"
}
},
"jet_ski": {
"lead": {
"version": "3.3"
}
}
},
"fj983j2r9jfjf": {
"car": {
"lead": {
"version": "1.1"
}
},
"bike": {
"lead": {
"version": "2.3"
}
},
"jet_ski": {
"lead": {
"version": "3.3"
}
}
}
}
The number of different dictionaries that contain car
, bike
and jet_ski
can be huge and not just two as in my example. The number of different vehicle types can also be much larger. My goal is to find a mismatch in a given type of vehicle version
between the different dictionaries. For example for bike
the version
is different between the two dictionaries.
The way I currently do it is by iterating through all sub-dictionaries in my dictionary and then looking for the version. I save the version in a class dictionary that contains the vehicle type and version and then start comparing to it. I am sure there is a much more elegant and pythonic way to go about this and would appreciate any feedback!
Here is more or less what I am doing:
def is_version_issue(vehicle_type: str, object_json: dict):
issue = False
for object_id in object_json:
current_object = object_json.get(object_id)
if vehicle_type in current_object:
current_vehicle_version = current_object.get(vehicle_type).get("lead").get("version")
# vehicles is a class dictionary that contains the vehicles I am looking for
if self.vehicles[vehicle_type]:
if self.vehicles[vehicle_type] == current_vehicle_version:
issue = False
continue
else:
return True
self.vehicles[vehicle_type] = current_vehicle_version
issue = False
return issue
CodePudding user response:
Well, your solution is not bad. There are a few things I would suggest to improve.
Iterate over sub-dictionaries directly
You don't seem to use the keys (object_id
) at all, so you might as well iterate via dict.values
.
No need for the issue
variable
You can just return your flag once an "issue" is found and otherwise return the opposite at the end of the loop.
Reduce indentation
Use continue
in the loop, if the vehicle_type
is not present to reduce indentation.
Decide what assumptions are sensible
If you know that each vehicle sub-dictionary will have the lead
key and the one below that will have the version
key (which you imply by using dict.get
multiple times without checking for None
first), just use regular dictionary subscript notation ([]
).
No need for the class dictionary
If you are checking a specific vehicle type when calling your function anyway, there is no need for that dictionary (as far as I can tell). You just need a local variable to hold the last known version number for that type.
Semantics
This may be a matter of personal preference, but I would design the function to return True
, if everything is "fine" and False
if there is a mismatch somewhere.
Specify type arguments
If you already take the time to use type annotations, you should take the time to specify your generics properly. Granted, in this case it may become unwieldy, if your dictionary nesting gets much deeper, but in that case you can still at least use dict[str, Any]
.
Use constants for repeating keys
To reduce the room for error, I like to define constants for strings that have fixed meaning in my code and are used repeatedly. That schema seems to be more or less fixed, so you can define the keys once and then use the constants throughout the code. This has the added benefit that it will be very easy to fix, if the schema does change for some reason and one of the keys is renamed (e.g. from version
to ver
or something like that).
Obviously, in this super simple situation this is overkill, but if you refer to the same keys in more places throughout your code, I highly suggest adopting this practice.
Suggested implementation
KEY_LEAD = "lead"
KEY_VERSION = "version"
def versions_consistent(
vehicle_type: str,
data: dict[str, dict[str, dict[str, dict[str, str]]]]
) -> bool:
version_found: str | None = None
for vehicles in data.values():
vehicle = vehicles.get(vehicle_type)
if vehicle is None:
continue
if version_found is None:
version_found = vehicle[KEY_LEAD][KEY_VERSION]
elif version_found != vehicle[KEY_LEAD][KEY_VERSION]:
return False
return True
Bonus
You might consider performing an additional check at the end, to see if version_found
is still None
. That might indicate that an invalid vehicle_type
was passed (e.g. due to a typo). In that case you could raise an exception.
As an alternative, if you know the vehicle types in advance, you can avoid this by defining them again as constants ahead of time and then checking at the beginning of the function, if a valid type was passed.
Finally, you could consider not just returning a bool
, but actually saving the mismatches/inconsistencies in some data structure and returning that to indicate which IDs had which versions for a specified vehicle type.
So it could also look something like this:
ALLOWED_VEHICLES = {"car", "bike", "jet_ski"}
def get_version_id_mapping(
vehicle_type: str,
data: dict[str, dict[str, dict[str, dict[str, str]]]]
) -> dict[str, set[str]]:
if vehicle_type not in ALLOWED_VEHICLES:
raise ValueError(f"{vehicle_type} is not a valid vehicle type")
version_id_map: dict[str, set[str]] = {}
for obj_id, vehicles in data.items():
vehicle = vehicles.get(vehicle_type)
if vehicle is None:
continue
ids = version_id_map.setdefault(vehicle["lead"]["version"], set())
ids.add(obj_id)
return version_id_map
Calling get_version_id_mapping("bike", d)
(d
being your example data) gives the following:
{'2.2': {'tigj09j32f0j2'}, '2.3': {'fj983j2r9jfjf'}}
Calling it for jet_ski
gives this:
{'3.3': {'fj983j2r9jfjf', 'tigj09j32f0j2'}}
So by checking the length of the output dictionary, you would see, if there is an inconsistency (length > 1
) or not.
Bonus 2
Come to think of it, if you want to do this check for every type of vehicle for the entire dataset anyway, this can all be done in one go:
def vehicle_type_versions(
data: dict[str, dict[str, dict[str, dict[str, str]]]]
) -> dict[str, dict[str, set[str]]]:
output: dict[str, dict[str, set[str]]] = {}
for obj_id, vehicles in data.items():
for vehicle_type, vehicle_data in vehicles.items():
sub_dict = output.setdefault(vehicle_type, {})
ids = sub_dict.setdefault(vehicle_data["lead"]["version"], set())
ids.add(obj_id)
return output
Calling this on your example data yield the following output:
{'bike': {'2.2': {'tigj09j32f0j2'}, '2.3': {'fj983j2r9jfjf'}}, 'car': {'1.1': {'fj983j2r9jfjf', 'tigj09j32f0j2'}}, 'jet_ski': {'3.3': {'fj983j2r9jfjf', 'tigj09j32f0j2'}}}
CodePudding user response:
def is_version_issue(vehicle_type: str, object_json: dict):
current_object = object_json[object_id]
for object_id in current_object:
if vehicle_type in object_json[object_id]:
current_vehicle_version = current_object[vehicle_type]["lead"]["version"]
# vehicles is a class dictionary that contains the vehicles I am looking for
if self.vehicles[vehicle_type]:
if self.vehicles[vehicle_type] != current_vehicle_version:
return True
return False
No I think this is the most logical way I think you have some redundancy but this makes sense to me. Aslo some other problems with you not being clear about how you are using the dict.
I would also not use the temp current_ variables but they are good if you are using a debugger.