I am trying to filter the list of values and store in a new list elements that have .xls and .csv extensions to it. However I don't get the full string output I want to ..
Reference data Edit : Updated the dummy dataset with other categories ( TP2/TP3 etc..) and not just TP1
dummylist = ['TP1',
'TP1/NXXXX',
'TP1/NXXXX/sample.csv',
'TP1/OX',
'TP1/OX/sample1.csv',
'TP1/TLXX/sample2.csv',
'TP1/TLXX/sample.xlsx',
'TP1/TLXX/sample1.xlsx',
'TP2',
'TP3',
]
I tried the below code
excellist = []
csvlist= []
for items in dummylist:
temp_name = items.split('/')[-1]
if temp_name.endswith(".csv"):
csvlist.append(items)
elif temp_name.endswith(".xlsx"):
excellist.append(items)
print(excellist)
['sample.xlsx', 'sample1.xlsx']
print(csvlist)
['sample.csv', 'sample1.csv', 'sample2.csv']
Edit : Changed append(items) and get the desired outcome.
Question : How do I get that only for keywords TP1 and ignore the other elements in list
excellist = [ 'TP1/TLXX/sample.xlsx',
'TP1/TLXX/sample1.xlsx']
csvlist =['TP1/NXXXX/sample.csv', 'TP1/OX/sample1.csv',
'TP1/TLXX/sample2.csv']
CodePudding user response:
You could also construct a dictionary that maps file types to lists of files, something like this:
filetypes = dict()
for f in dummylist:
k = f.split('.')[-1]
if k in filetypes:
filetypes[k].append(f)
else:
filetypes[k] = [f]
excellist = filetypes['xlsx']
csvlist = filetypes['csv']
For a less verbose version of the loop, either use the builtin setdefault
,
for f in dummylist:
k = f.split('.')[-1]
filetypes.setdefault(k, []).append(f)
or, if you're a fan of the collections
module like me, a defaultdict
:
from collections import defaultdict
filetypes = defaultdict(list)
for f in dummylist:
k = f.split('.')[-1]
filetypes[k].append(f)
CodePudding user response:
Try using list comprehension and just looking at the last three or four characters of each string
csv, xlsx = ([x for x in dummylist if x[-4:] == '.csv'],
[x for x in dummylist if x[-5:] == '.xlsx'])
print(csv) # -> ['TP1/NXXXX/sample.csv', 'TP1/OX/sample1.csv', 'TP1/TLXX/sample2.csv']
print(xlsx) # -> ['TP1/TLXX/sample.xlsx', 'TP1/TLXX/sample1.xlsx']