Best way to find a mismatched value that can exist in different locations in a nested dictionary-CodePudding

So I have a dictionary that looks something like the following:

{
    "tigj09j32f0j2": {
        "car": {
            "lead": {
                "version": "1.1"
            }
        },
        "bike": {
            "lead": {
                "version": "2.2"
            }
        },
        "jet_ski": {
            "lead": {
                "version": "3.3"
            }
        }
    },
    "fj983j2r9jfjf": {
        "car": {
            "lead": {
                "version": "1.1"
            }
        },
        "bike": {
            "lead": {
                "version": "2.3"
            }
        },
        "jet_ski": {
            "lead": {
                "version": "3.3"
            }
        }
    }
}

The number of different dictionaries that contain car, bike and jet_ski can be huge and not just two as in my example. The number of different vehicle types can also be much larger. My goal is to find a mismatch in a given type of vehicle version between the different dictionaries. For example for bike the version is different between the two dictionaries.

The way I currently do it is by iterating through all sub-dictionaries in my dictionary and then looking for the version. I save the version in a class dictionary that contains the vehicle type and version and then start comparing to it. I am sure there is a much more elegant and pythonic way to go about this and would appreciate any feedback!

Here is more or less what I am doing:

def is_version_issue(vehicle_type: str, object_json: dict):
    issue = False
    for object_id in object_json:
        current_object = object_json.get(object_id)
        if vehicle_type in current_object:
            current_vehicle_version = current_object.get(vehicle_type).get("lead").get("version")
            # vehicles is a class dictionary that contains the vehicles I am looking for
            if self.vehicles[vehicle_type]:
                if self.vehicles[vehicle_type] == current_vehicle_version:
                    issue = False
                    continue
                else:
                    return True
            self.vehicles[vehicle_type] = current_vehicle_version
            issue = False
    return issue

CodePudding user response：

Well, your solution is not bad. There are a few things I would suggest to improve.

Iterate over sub-dictionaries directly

You don't seem to use the keys (object_id) at all, so you might as well iterate via dict.values.

No need for the `issue` variable

You can just return your flag once an "issue" is found and otherwise return the opposite at the end of the loop.

Reduce indentation

Use continue in the loop, if the vehicle_type is not present to reduce indentation.

Decide what assumptions are sensible

If you know that each vehicle sub-dictionary will have the lead key and the one below that will have the version key (which you imply by using dict.get multiple times without checking for None first), just use regular dictionary subscript notation ([]).

No need for the class dictionary

If you are checking a specific vehicle type when calling your function anyway, there is no need for that dictionary (as far as I can tell). You just need a local variable to hold the last known version number for that type.

Semantics

This may be a matter of personal preference, but I would design the function to return True, if everything is "fine" and False if there is a mismatch somewhere.

Specify type arguments

If you already take the time to use type annotations, you should take the time to specify your generics properly. Granted, in this case it may become unwieldy, if your dictionary nesting gets much deeper, but in that case you can still at least use dict[str, Any].

Use constants for repeating keys

To reduce the room for error, I like to define constants for strings that have fixed meaning in my code and are used repeatedly. That schema seems to be more or less fixed, so you can define the keys once and then use the constants throughout the code. This has the added benefit that it will be very easy to fix, if the schema does change for some reason and one of the keys is renamed (e.g. from version to ver or something like that).

Obviously, in this super simple situation this is overkill, but if you refer to the same keys in more places throughout your code, I highly suggest adopting this practice.

Suggested implementation

KEY_LEAD = "lead"
KEY_VERSION = "version"

def versions_consistent(
    vehicle_type: str,
    data: dict[str, dict[str, dict[str, dict[str, str]]]]
) -> bool:
    version_found: str | None = None
    for vehicles in data.values():
        vehicle = vehicles.get(vehicle_type)
        if vehicle is None:
            continue
        if version_found is None:
            version_found = vehicle[KEY_LEAD][KEY_VERSION]
        elif version_found != vehicle[KEY_LEAD][KEY_VERSION]:
            return False
    return True

Bonus

You might consider performing an additional check at the end, to see if version_found is still None. That might indicate that an invalid vehicle_type was passed (e.g. due to a typo). In that case you could raise an exception.

As an alternative, if you know the vehicle types in advance, you can avoid this by defining them again as constants ahead of time and then checking at the beginning of the function, if a valid type was passed.

Finally, you could consider not just returning a bool, but actually saving the mismatches/inconsistencies in some data structure and returning that to indicate which IDs had which versions for a specified vehicle type.

So it could also look something like this:

ALLOWED_VEHICLES = {"car", "bike", "jet_ski"}

def get_version_id_mapping(
    vehicle_type: str,
    data: dict[str, dict[str, dict[str, dict[str, str]]]]
) -> dict[str, set[str]]:
    if vehicle_type not in ALLOWED_VEHICLES:
        raise ValueError(f"{vehicle_type} is not a valid vehicle type")
    version_id_map: dict[str, set[str]] = {}
    for obj_id, vehicles in data.items():
        vehicle = vehicles.get(vehicle_type)
        if vehicle is None:
            continue
        ids = version_id_map.setdefault(vehicle["lead"]["version"], set())
        ids.add(obj_id)
    return version_id_map

Calling get_version_id_mapping("bike", d) (d being your example data) gives the following:

{'2.2': {'tigj09j32f0j2'}, '2.3': {'fj983j2r9jfjf'}}

Calling it for jet_ski gives this:

{'3.3': {'fj983j2r9jfjf', 'tigj09j32f0j2'}}

So by checking the length of the output dictionary, you would see, if there is an inconsistency (length > 1) or not.

Bonus 2

Come to think of it, if you want to do this check for every type of vehicle for the entire dataset anyway, this can all be done in one go:

def vehicle_type_versions(
    data: dict[str, dict[str, dict[str, dict[str, str]]]]
) -> dict[str, dict[str, set[str]]]:
    output: dict[str, dict[str, set[str]]] = {}
    for obj_id, vehicles in data.items():
        for vehicle_type, vehicle_data in vehicles.items():
            sub_dict = output.setdefault(vehicle_type, {})
            ids = sub_dict.setdefault(vehicle_data["lead"]["version"], set())
            ids.add(obj_id)
    return output

Calling this on your example data yield the following output:

{'bike': {'2.2': {'tigj09j32f0j2'}, '2.3': {'fj983j2r9jfjf'}},
 'car': {'1.1': {'fj983j2r9jfjf', 'tigj09j32f0j2'}},
 'jet_ski': {'3.3': {'fj983j2r9jfjf', 'tigj09j32f0j2'}}}