Home > Back-end >  iterating list in python and splitting based on condition
iterating list in python and splitting based on condition

Time:07-20

I am trying to filter the list of values and store in a new list elements that have .xls and .csv extensions to it. However I don't get the full string output I want to ..

Reference data Edit : Updated the dummy dataset with other categories ( TP2/TP3 etc..) and not just TP1

dummylist = ['TP1',
             'TP1/NXXXX',
             'TP1/NXXXX/sample.csv',
            'TP1/OX',
            'TP1/OX/sample1.csv',
            'TP1/TLXX/sample2.csv',
            'TP1/TLXX/sample.xlsx',
            'TP1/TLXX/sample1.xlsx', 
            'TP2',
            'TP3',


] 

I tried the below code

excellist = []
csvlist= []

for items in dummylist:
    temp_name = items.split('/')[-1]
    
    if temp_name.endswith(".csv"):
        csvlist.append(items)
        
    elif temp_name.endswith(".xlsx"):
        excellist.append(items)
    
print(excellist)
['sample.xlsx', 'sample1.xlsx']

print(csvlist)
['sample.csv', 'sample1.csv', 'sample2.csv']

Edit : Changed append(items) and get the desired outcome.

Question : How do I get that only for keywords TP1 and ignore the other elements in list

excellist = [ 'TP1/TLXX/sample.xlsx',
            'TP1/TLXX/sample1.xlsx']

csvlist =['TP1/NXXXX/sample.csv', 'TP1/OX/sample1.csv',
            'TP1/TLXX/sample2.csv']

CodePudding user response:

You could also construct a dictionary that maps file types to lists of files, something like this:

filetypes = dict()

for f in dummylist:
    k = f.split('.')[-1]
    
    if k in filetypes:
        filetypes[k].append(f)
    else:
        filetypes[k] = [f]

excellist = filetypes['xlsx']
csvlist = filetypes['csv']

For a less verbose version of the loop, either use the builtin setdefault,

for f in dummylist:
    k = f.split('.')[-1]
    filetypes.setdefault(k, []).append(f)

or, if you're a fan of the collections module like me, a defaultdict:

from collections import defaultdict

filetypes = defaultdict(list)

for f in dummylist:
    k = f.split('.')[-1]
    filetypes[k].append(f)

CodePudding user response:

Try using list comprehension and just looking at the last three or four characters of each string

csv, xlsx = ([x for x in dummylist if x[-4:] == '.csv'],
             [x for x in dummylist if x[-5:] == '.xlsx'])

print(csv)  # -> ['TP1/NXXXX/sample.csv', 'TP1/OX/sample1.csv', 'TP1/TLXX/sample2.csv']
print(xlsx)  # -> ['TP1/TLXX/sample.xlsx', 'TP1/TLXX/sample1.xlsx']
  • Related