Home > Software engineering >  Ordering a list coming from a Json File using a variable number of keys
Ordering a list coming from a Json File using a variable number of keys

Time:05-10

This post ended up a bit long, hopefully i explained myself as it does require detail. Let me know how to improve it.


The summary of the problem is:

  • A routine reads a Json file and creates a list of it in Python using the json library.
  • This entries of the list are of course dictionaries. I have no saying in how the original data is set and although the data can be manipulated inside the routine, the output format should be the same as the input format. The json is uniform, it is always the same format.
  • The type of the value of each key differs, one value can be a string, others can be a dictionary, booleans or a list. (See example below)
  • The idea is to order given a set of Keys, but the number of keys will be usually variable (user dependent). It might be that at some point a user wants to order using only one key, but another user needs the order based on two keys, and another might use three keys.
  • Not only that, the keys themselves might change. For example, one user might want to order based on keys A, B, C, but another might want to order using the keys B, D.
  • Taking into consideration the two points above, it might that a recursive order is required as the key in question might be referring to the dictionary in the key value, not the dictionary in the list.

Lets go with the details.

First a sample of the data. This is both a bit of a simplification and an illustrative example as I cant share the actual data in question. When I load the json file the list looks something like this:

[ {'district': 'Cave', 'profession': 'Teacher', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

[ {'district': 'Mountain', 'profession': 'Baker', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

[ {'district': 'Castle', 'profession': 'Professor', 'details': {'gender': 'F', 'status': 'Single', 'kids': False}, 'availability': False, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]

[ {'district': 'Dungeon', 'profession': 'Professor', 'details': {'gender': 'M', 'status': 'Married', 'kids': True}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]

[ {'district': 'Castle', 'profession': 'Policeman', 'details': {'gender': 'NA', 'status': 'Single', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]

[ {'district': 'Dungeon', 'profession': 'Policeman', 'details': {'gender': 'NA', 'status': 'Single', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

[ {'district': 'Cave', 'profession': 'Secretary', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

This data sample does reflect the type of cases I can encounter in my real life problem:

District: a given number of district names. There will be a lot repetition. Multiple entries will have the same district. Profession: same as above. Details: this is a dictionary that can contain strings and booleans as shown. Availability: a boolean data. Preferences: a list of preferences that, as shown, for the most cases it will be the same (this makes sense in our real life example). This list can be shorter or larger, but I although I include this list because is part of my real life example, we can consider this entry low priority to solve the problem. I would like to focus on the rest of the keys.

It is important to say that each key will have repetition, a lot of entries will have the same district, other will have the same profession, other details like gender and kids will of course overlap between entries.

So, given the data, if I want to order only by Key = district the result should be this (correct me if you see a mistake and I will edit, Im doing this example by hand):


[ {'district': 'Castle', 'profession': 'Professor', 'details': {'gender': 'F', 'status': 'Single', 'kids': False}, 'availability': False, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]

[ {'district': 'Castle', 'profession': 'Policeman', 'details': {'gender': 'NA', 'status': 'Single', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]

[ {'district': 'Cave', 'profession': 'Secretary', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

[ {'district': 'Cave', 'profession': 'Teacher', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]


[ {'district': 'Dungeon', 'profession': 'Professor', 'details': {'gender': 'M', 'status': 'Married', 'kids': True}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]

[ {'district': 'Dungeon', 'profession': 'Policeman', 'details': {'gender': 'NA', 'status': 'Single', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

[ {'district': 'Mountain', 'profession': 'Baker', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

For district and profession it would be:


[ {'district': 'Castle', 'profession': 'Policeman', 'details': {'gender': 'NA', 'status': 'Single', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]

[ {'district': 'Castle', 'profession': 'Professor', 'details': {'gender': 'F', 'status': 'Single', 'kids': False}, 'availability': False, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]



[ {'district': 'Cave', 'profession': 'Secretary', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

[ {'district': 'Cave', 'profession': 'Teacher', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

[ {'district': 'Dungeon', 'profession': 'Policeman', 'details': {'gender': 'NA', 'status': 'Single', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

[ {'district': 'Dungeon', 'profession': 'Professor', 'details': {'gender': 'M', 'status': 'Married', 'kids': True}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]


[ {'district': 'Mountain', 'profession': 'Baker', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

But now i want to order using profession and gender, keep in mind gender is in a dictionary, the result would like this:


[ {'district': 'Castle', 'profession': 'Professor', 'details': {'gender': 'F', 'status': 'Single', 'kids': False}, 'availability': False, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]

[ {'district': 'Castle', 'profession': 'Policeman', 'details': {'gender': 'NA', 'status': 'Single', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]

[ {'district': 'Cave', 'profession': 'Secretary', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

[ {'district': 'Cave', 'profession': 'Teacher', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]


[ {'district': 'Dungeon', 'profession': 'Professor', 'details': {'gender': 'M', 'status': 'Married', 'kids': True}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]

[ {'district': 'Dungeon', 'profession': 'Policeman', 'details': {'gender': 'NA', 'status': 'Single', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

[ {'district': 'Mountain', 'profession': 'Baker', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

And another person would like to order only by status and kids (again, these are inside the dictionary):

[ {'district': 'Dungeon', 'profession': 'Professor', 'details': {'gender': 'M', 'status': 'Married', 'kids': True}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]

[ {'district': 'Mountain', 'profession': 'Baker', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

[ {'district': 'Cave', 'profession': 'Secretary', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

[ {'district': 'Cave', 'profession': 'Teacher', 'details': {'gender': 'F', 'status': 'Married', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

[ {'district': 'Castle', 'profession': 'Professor', 'details': {'gender': 'F', 'status': 'Single', 'kids': False}, 'availability': False, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]

[ {'district': 'Dungeon', 'profession': 'Policeman', 'details': {'gender': 'NA', 'status': 'Single', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading'] } ]

[ {'district': 'Castle', 'profession': 'Policeman', 'details': {'gender': 'NA', 'status': 'Single', 'kids': False}, 'availability': True, 'preferences': ['Travel', 'Games', 'Food', 'Reading']  }]

And so on and so forth, i dont want to populate only with examples.

I know i can order the list using a lambda function, for example, lets say the list is called people, the code to obtain the first result would be:

people_sorted = sorted(people, key=lambda k: k['district'])

And for the second one would be:

people_sorted = sorted(people, key=lambda k: (k['district'],k['profession']))

And for the third example it would be:

people_sorted = sorted(people, key=lambda k: (['details']['gender'],k['details']['gender']))

And the last one would be:

people_sorted = sorted(people, key=lambda k: (['details']['status'],k['details']['gender']))

Of course I can go and put as many parameters as needed, for example if i need to order in terms of gender, profession and district i can use:

people_sorted = sorted(people, key=lambda k: (k['details']['gender'], k['profession'], k['district']))

I know you can also define the key function, but although im open to any suggestion, im trying to keep this question as short as possible so im including only the lambda method.

The problem that you probably are recognizing here is how do I create the tuple in the lambda function based on the user input. I dont know if 'dynamically' is the correct term, but the question would be.

If i have a given number of keys A1, A2, A3...AN , where N can change, how do i properly obtain:

people_sorted = sorted(people, key=lambda k: (k['A1'], k['A2'],... k['AN']))

With the added difficulty that, for example, AM can be referring to dictionary called 'details, so the actual code might look like this:

people_sorted = sorted(people, key=lambda k: (k['A1'], k['A2'],..., k['details']['AM'] ,..., k['AN']))

Again:

I used the lambda example for the sake of brevity, the post itself is already very long due the examples.

We didnt touch the list example (preferences) which i included to reflect the actual data, but it seems to me i can solve it if i get the other cases solved.


Hopefully i explained myself and apologies for how long the post ended up being.

Thanks

CodePudding user response:

If I were trying to do a quick solution. First I'd start with a mapping from field names to functions that extract that field name:

field_name_mapping = {
     "gender": lambda record: record['details']['gender'],
     "district": lambda record: record['district']
     ...
}

Then when the user types in "gender, district", or however input is done in your system, you break that up into tokens, and then create a list of functions

function_list = [field_name_mapping[token] for token in parsed_user_input]

where the precise details of how you parse the user input is up to you.

Finally, you return:

sorted(people, key=lambda record: [f(record) for f in function_list]

CodePudding user response:

I would solve this by using several small functions and classes first make a custom dictionary that returns items from its sub dictionaries if there are any.

from collections import UserDict
class custom_dict(UserDict):
    def __getitem__(self, key):
        if key in self.keys():
            return super().__getitem__(key)
        else:
            for inner_key in self.keys():
                if isinstance(self[inner_key],type(self)):
                    try:
                        return self[inner_key][key]
                    except Exception:
                        pass
            raise KeyError(f"{key} is not in dictionary")

then make a hook for the json loads function

def as_custom_dict(dct):
    return custom_dict(dct)

Then when you read the json pass the as_custom_dict to the object_hook argument. this allows you to make the dict created by the json load or loads function into a custom dict like class.

json_values = json.loads(values,object_hook=as_custom_dict)

Next make a function that takes in the dictionary and a list of keys they want to sort for and returns a tuple of those values

def get_item_from_dict(dct,keys_to_find):
    x = []
    for key in keys_to_find:
        x.append(dct[key])
    return tuple(x)

finally make a function that takes in the json values and the arguments as a list and returns the sorted list

from functools import partial
def order_by(json_values,args):
    arguments = args.split(',')    

    lambda_to_find = partial(get_item_from_dict,keys_to_find = arguments)
    return sorted(json_values,key=lambda_to_find)

together this should allow for what you want.

  • Related