I have a code that generates a python list and I sorted this list in such lines
mylist = list_files(MAIN_DIR)
print(sorted(mylist, key=lambda x: str(x.split('\\')[-1][:-4].replace('_',''))))
Now I got the python list sorted as desired. How can I split this main python list into suclists based on the similar pdf names? Have a look at the snapshot to get what I mean exactly
Now I can loop through the keys and values of the grouped dictionary
for el in mylist:
file_name = el.split('\\')[-1].split('.')[0]
if file_name not in grouped_files.keys():
grouped_files[file_name] = []
grouped_files[file_name].append(el)
for key, value in grouped_files.items():
pdfs = value
merger = PdfFileMerger()
for pdf in pdfs:
merger.append(pdf)
merger.write(OUTPUT_DIR / f'{key}.pdf')
merger.close()
But I got an error AttributeError: 'WindowsPath' object has no attribute 'write'
CodePudding user response:
Here's a small example using groupby to create a dict of lists.
from itertools import groupby
mylist = ['..\\1\\1.pdf', '..\\2\\2.pdf', '..\\1\\1.pdf', '..\\2\\2.pdf']
key=lambda x: str(x.split('\\')[-1][:-4].replace('_',''))
results = {}
for filename, grouped in groupby(sorted(mylist, key=key), key=key):
results[filename] = list(grouped)
results
is a dictionary, with each key being the same key as was used to sort. Depending on what exactly you're looking to take into consideration, you can use different keys to derive different lists. One thing to note when using groupby
is that if you want to make sure you're getting only a single set of data for each key, you need to sort by that key first. The other thing to note is that grouped
is a generator object, not a list. This is a means of efficiency, but if you want a list, you can call list(grouped)
to convert into a list, as shown.
>>> import pprint
>>> pprint.pprint(results)
{'1': ['..\\1\\1.pdf', '..\\1\\1.pdf'], '2': ['..\\2\\2.pdf', '..\\2\\2.pdf']}
>>>
It can also be solved with a defaultdict. This doesn't require having to sort anything first.
from collections import defaultdict
mylist = ['..\\1\\1.pdf', '..\\2\\2.pdf', '..\\1\\1.pdf', '..\\2\\2.pdf']
key=lambda x: str(x.split('\\')[-1][:-4].replace('_',''))
results = defaultdict(list)
for filename in mylist:
results[key(filename)].append(filename)
Which results in
>>> import pprint
>>> pprint.pprint(results)
defaultdict(<class 'list'>,
{'1': ['..\\1\\1.pdf', '..\\1\\1.pdf'],
'2': ['..\\2\\2.pdf', '..\\2\\2.pdf']})
Using defaultdict means you can reference a value, in this case the same sort key, in a dictionary and get a default value back instead of a ValueError. It's set to return an empty list if it sees a new value, so we can append to the value every time.
CodePudding user response:
I'm guessing you want directory paths grouped by file names. If that's the case, my approach will be something like the following:
mylist = list_files(MAIN_DIR)
grouped_files = {}
for el in mylist:
file_name = el.split('\\')[-1].split('.')[0]
if file_name not in grouped_files.keys():
grouped_files[file_name] = []
grouped_files[file_name].append(el)