Home > Net >  Filtering list of dicts based on a key value in python
Filtering list of dicts based on a key value in python

Time:10-05

I have a list of dictionaries in python which looks like below

 list = [{'entityType': 'source', 'databaseName': 'activities', 'type': 'POSTGRES', 'children': [{'id': '3c144414-0c73-41df-9f0e-4dd7cb5af46e',
       'path': ['Activities (DEV)', 'public'],
       'type': 'CONTAINER',
       'containerType': 'FOLDER'}]'checkTableAuthorizer': False}, 
       {'entityType': 'source', 'databaseName': 'pd-prod-dev', 'type': 'POSTGRES', 'children': 
        [{'id': '75d84ead-a9fe-4949-bc21-d4deb34e1ae1',
       'path': ['pg-prd (DEV-RR)', 'pghero'],
       'tag': 'PWGqdrkcD08=',
       'type': 'CONTAINER',
       'containerType': 'FOLDER'},
      {'id': 'facc2c20-7561-430f-ac35-547b5bc7a92f',
       'path': ['pg-prd (DEV-RR)', 'public'],
       'tag': 'gcUL0NTOc 4=',
       'type': 'CONTAINER',
       'containerType': 'FOLDER'}]'checkTableAuthorizer': False},
 {'entityType': 'source', 'databaseName': 'pd-prod-prd', 'type': 'POSTGRES', 'children': 
        [{'id': '75d84ead-a9fe-4949-bc21-d4deb34e1ae1',
       'path': ['pg-prd (PRD-RR)', 'pghero'],
       'tag': 'PWGqdrkcD08=',
       'type': 'CONTAINER',
       'containerType': 'FOLDER'},
      {'id': 'facc2c20-7561-430f-ac35-547b5bc7a92f',
       'path': ['pg-prd (PRD-RR)', 'public'],
       'tag': 'gcUL0NTOc 4=',
       'type': 'CONTAINER',
       'containerType': 'FOLDER'}]'checkTableAuthorizer': False}]

This is just a sample. The actual list has a list of 30 dictionaries. What I am trying to do is filter out the dictionaries where the nested children dictionary has only ' public' schema in it. So my expected output would be

     public_list = [{'entityType': 'source', 'databaseName': 'activities', 'type': 'POSTGRES', 'children': [{'id': '3c144414-0c73-41df-9f0e-4dd7cb5af46e',
           'path': ['Activities (DEV)', 'public'],
           'type': 'CONTAINER',
           'containerType': 'FOLDER'}]'checkTableAuthorizer': False}, 
           {'entityType': 'source', 'databaseName': 'pd-prod-dev', 'type': 'POSTGRES', 'children': 
            [{'id': 'facc2c20-7561-430f-ac35-547b5bc7a92f',
           'path': ['pg-prd (DEV-RR)', 'public'],
           'tag': 'gcUL0NTOc 4=',
           'type': 'CONTAINER',
           'containerType': 'FOLDER'}]'checkTableAuthorizer': False},
 {'entityType': 'source', 'databaseName': 'pd-prod-prd', 'type': 'POSTGRES', 'children': 
            [{'id': 'facc2c20-7561-430f-ac35-547b5bc7a92f',
           'path': ['pg-prd (PRD-RR)', 'public'],
           'tag': 'gcUL0NTOc 4=',
           'type': 'CONTAINER',
           'containerType': 'FOLDER'}]'checkTableAuthorizer': False}]

I tried accessing the nested dict children by iterating but unable to filter out what condition to use

for d in list:
    for k, v in d.items():
        if k == 'children':
            print(v)

I would love to apply this as a function since I'll be reusing it on a pandas column of list of dicts

CodePudding user response:

You could create a function that gets the public data for children of each entry:

def get_public_data(data):
    result = []
    children = data.get("children")
    if children:
        for row in children:
            path = row.get("path")
            if path and "public" in path:
                result.append(row)
    return result

And then create a new list of entries where you just replace the children key:

public_list = []
for x in entities:
    public_data = get_public_data(x)
    if public_data:
        public_list.append({**x, "children": public_data})

Combine these two and you'll get the function you need.

CodePudding user response:

IIUC you want to collect the entries were all items have a public schema?

Assuming your 'children' keys are always valid and a tuple of 2 elements, you can use a simple comprehension:

[d for d in lst
 if all(e['path'][1] == 'public' for e in d['children'])
]

NB. I called your input lst as list is a python builtin

  • Related