Home > database >  How to get data from list of dict on basis of string
How to get data from list of dict on basis of string

Time:08-16

I have a list of dictionaries that looks like this:

data = [{'name': 'root/folder1/asd/file.csv'},
        {'name': 'root/folder1/bsd/file.csv'}, 
        {'name': 'root/folder1/folder2/folder3/new.csv'}, 
        {'name': 'root/folder1/folder2/folder3'}]

I want to take this list and pare it down to include only files that have a certain extension and exist in the shallowest folder in the list. That is, if a path has two / in it, and all other paths have at least two, then filter out all paths that have more than two. If the smallest number of slashes is three, then filter out anything with more than three.

This is what I started with:

for path in data:
    if path.get('name').endswith('.exe') or path.endswith('.csv'):
       path_count = len(re.findall('/', path)) 
       path.update({'path_count': path_count})

Now I will check minimum count by again applying a for loop. Is there a cleaner way to do this?

CodePudding user response:

It's a little unclear what you're asking, but you can leverage list comprehensions and their filters to get a list of objects that meet some arbitrary requirements:

extensions = {"exe", "csv"}

def ext_filter(path: str) -> bool:
    ext = path.split(".")[-1]
    return ext and ext in extensions

def slash_filter(path: str, count: int) -> bool:
    return ext_filter(path) and count == path.count("/")

slash_count = min((path.get("name", "").count("/") for path in data if ext_filter(path.get("name"))))

valid_paths = [path for path in data if slash_filter(path.get("name", ""), slash_count)]

CodePudding user response:

data = [{'name': 'root/folder1/asd/file.csv'},
        {'name': 'root/folder1/bsd/file.csv'}, 
        {'name': 'root/folder1/folder2/folder3/new.csv'}, 
        {'name': 'root/folder1/folder2/folder3'}]


min_count = min(list(
                    # mapping lambda count on the pathnames in 
                    map(lambda x: list(x.values()).pop().count('/'), 
                    # the generator expression for dictionaries in the list
                    (d for d in data))))

for el in data:
    for key, item in el.items():
        if item.count('/') == min_count and item.endswith(('.exe', '.csv')):
            print(f'key {key} item {item}')
                  
                  key name item root/folder1/asd/file.csv
                  key name item root/folder1/bsd/file.csv

CodePudding user response:

If time of running the code matters you can do it in one go, controlling that you include only shortest paths. For example:

results = []
for item in data:
    path = item['name']
    if path.endswith('exe') or path.endswith('csv'):
        path_count = path.count('/')
        if len(results) == 0:
            results.append(path)
            n = path_count
        elif path_count == n:
            results.append(path)
        elif path_count < n:
            results = [path]
            n = path_count
  • Related