Python filter nested Dict-CodePudding

I am a beginner to python and am trying to figure out how to filter a dict in the best way possible. I have read several different ways to do this, but none in the exact way I want it. I have the below dict:

{
    "clients": [{
        "name": "John A",
        "Age": "27",
        "data": {
            "gender": "Male",
            "height": "6'2"
            }
        },
        {
            "name": "John B",
            "age": "31",
            "data": {
                "gender": "Male",
                "height": "5'11",
                "telephones": [{
                    "home": "1234567890"
                },
                {
                    "mobile": "0987654321"
                }
                ]
            }
        }
    ]
}

This can contain a lot of other data and clients. So what I am trying to do is filter the dict so I only retrieve the fields I want and put it in a new dict. For example, I am requesting the name, gender, and home phone of all clients. I loop through all the clients and have been trying to use the below code but I cant get the nested fields to work. Is there any way to use "in" to filter nested fields? Thanks

new_dict = {
    key: v for k, v in clientDict.items() 
        if k in {'name'}
        #I've tried 'data.gender' or anything I've tried doesn't work here
    }

CodePudding user response：

Since you have a nested dictionary, your filtering logic also has to be nested. Here's an example that works with the sample data you've provided and returns the filtered data in its original structure:

new_dict = {
    "clients": [{
        "name": client["name"],
        "data": {
            "gender": client["data"]["gender"],
            "telephones": [
                phone
                for phone in client["data"].get("telephones", [])
                if "home" in phone
            ]
        }
    } for client in client_dict["clients"]]
}

If you wanted to do it without hardcoding the specific structure, a recursive function is a good way to handle arbitrary nesting. Here's an example with a function that takes a set of keys to include; this produces a slightly nicer result than the hard-coded version because it can filter out the empty telephones list if there's no home phone given:

def filter_nested_dict(obj, keys):
    if isinstance(obj, list):
        new_list = []
        for i in obj:
            new_i = filter_nested_dict(i, keys)
            if new_i:
                new_list.append(new_i)
        return new_list
    if isinstance(obj, dict):
        new_dict = {}
        for k, v in obj.items():
            if k not in keys:
                continue
            new_v = filter_nested_dict(v, keys)
            if new_v:
                new_dict[k] = new_v
        return new_dict
    return obj

new_dict = filter_nested_dict(
    client_dict,
    {"clients", "name", "data", "gender", "telephones", "home"}
)

from pprint import pprint
pprint(new_dict)

Result:

{'clients': [{'data': {'gender': 'Male'}, 'name': 'John A'},
             {'data': {'gender': 'Male',
                       'telephones': [{'home': '1234567890'}]},
              'name': 'John B'}]}

CodePudding user response：

Maybe something like:

clients = []
for elem in data['clients']:
    clients.append({k: v for k, v in elem.items() if k in {'name', 'gender'}})
print({'clients': clients})

Output:

{'clients': [{'name': 'John A'}, {'name': 'John B'}]}

CodePudding user response：

Another solution, little bit more explicit. It will create new list with clients with key home_phone where values are lists with home phone numbers:

dct = {
    "clients": [
        {
            "name": "John A",
            "Age": "27",
            "data": {"gender": "Male", "height": "6'2"},
        },
        {
            "name": "John B",
            "age": "31",
            "data": {
                "gender": "Male",
                "height": "5'11",
                "telephones": [
                    {"home": "1234567890"},
                    {"mobile": "0987654321"},
                ],
            },
        },
    ]
}


def get_all_home_phones(lst):
    out = []
    for phone in lst:
        if "home" in phone:
            out.append(phone["home"])
    return out


out = []
for c in dct["clients"]:
    out.append(
        {
            "name": c["name"],
            "gender": c["data"]["gender"],
            "home_phone": get_all_home_phones(c["data"].get("telephones", [])),
        }
    )

print(out)

Prints:

[
    {"name": "John A", "gender": "Male", "home_phone": []},
    {"name": "John B", "gender": "Male", "home_phone": ["1234567890"]},
]

CodePudding user response：

Another solution can solve your task. item.get() is like a parity check, it will return None if name is not present. Similarly it will check first if 'data' in item, if it does not find data attribute it will bypass the remaining block. Thus, following code will run whatever the condition becomes, it will run and show results as you want.

clients = {}
details = []
for item in data['clients']:
    clients['name'] = item.get('name')
    clients['age'] = item.get('age')

    if 'data' in item:
        if 'gender' in item['data']:
            clients['gender'] = item['data']['gender']
        
        if 'telephones' in item['data']:
            for contact in item['data']['telephones']:
                if 'home' in contact:
                    clients['telephones'] = [contact.get('home')]

    details.append(clients.copy())

Prints:

[{'name': 'John A', 'age': '27', 'gender': 'Male'}, 
 {'name': 'John B', 'age': '31', 'gender': 'Male', 'telephones': ['1234567890']}]