Home > Software design >  Why when I merge it doesn't merge everything that's in the folder
Why when I merge it doesn't merge everything that's in the folder

Time:06-03

This code basically already create the PDFs. After it created the PDF it is copied in its own folder. What I am trying to do is merge what is in the folder. Then it would go to the next folder and do the merge. Then on to the next folder and do the merge. And such. But when I do it, it's just merging the last PDF and not all the PDFs.

import os
import shutil
import time
from PyPDF2 import PdfFileMerger
from reportlab.pdfgen.canvas import Canvas

path = input("Paste the folder path of which all the PDFs are located to begin the automation.\n")
# Only allowed to use your C or H drive.
while True:
    if "C" in path[0]:
        break
    elif "H" in path[0]:
        break
    else:
        print("Sorry you can only use your C drive or H drive\n")
    path = input("Paste the folder path of which all the PDFs are located to begin the automation.\n")

moving_path = path   "\\Script"
new_path = moving_path   "\\1"
folder_name = {}

# List all directories or files in the specific path
list_dir = ["040844_135208_3_192580_Sample_010.pdf", "040844_135208_3_192580_Sample_020.pdf",
            "040844_135208_3_192580_Sample_050.pdf", "058900_84972_3_192163_Sample_010.pdf",
            "058900_84972_3_192163_Sample_020.pdf", "058900_84972_3_192163_Sample_030.pdf"]


# Pauses the program
def wait(num):
    time.sleep(num)


# Change and make directory
def directory():
    os.chdir(path)

    for i in list_dir:
        canvas = Canvas(i)
        canvas.drawString(72, 72, "Hello, World")
        canvas.save()
    os.makedirs("Script")
    os.chdir(path   "\\Script")
    os.makedirs("1")
    os.makedirs("Merge")
    os.chdir(new_path)


def main():
    match = []
    for i in list_dir:
        search_zero = i.split("_")[2]
        if search_zero != "0":
            match.append((i.split("_", 3)[-1][:6]))

        else:
            match.append((i.split("_", 0)[-1][:6]))

    new_match = []
    for i, x in enumerate(match):
        if "_" in match[i]:
            new_match.append(x[:-1])
        else:
            new_match.append(x)

    for i in list_dir:
        key = i.split("_", 3)[-1][:6]

        if key in folder_name:
            folder_name[key].append(i)
        else:
            folder_name[key] = [i]

    for i, x in enumerate(list_dir):
        # Skips over the error that states that you can't make duplicate folder name
        try:
            os.makedirs((new_match[i]))
        except FileExistsError:
            pass

        # Moves the file that doesn't contain "PDFs" into the "1" folder and the one that does in the "Merge" folder
        if "PDFs" not in list_dir[i]:
            shutil.copy(f"{path}\\{list_dir[i]}", f"{new_path}\\{new_match[i]}")
            os.chdir(f"{new_path}\\{new_match[i]}")
            merger = PdfFileMerger(x)
            merger.append(x)
            merger.write(f"{new_match[i]}.pdf")
            merger.close()
            os.chdir(new_path)

        else:
            shutil.copy(f"{path}\\{list_dir[i]}", f"{moving_path}\\Merge\\{x}")


directory()
wait(0.7)
main()
print("Done!")
wait(2)

CodePudding user response:

I have these 4 PDFs:

pg1.pdf pg2.pdf pg3.pdf pg4.pdf
Pg 1 Pg 2 Pg 3 Pg 4

Here's a starter-script to merge Pg1 and Pg2 into one PDF, and Pg3 and Pg4 into another:

from PyPDF2 import PdfMerger

# Create merger object
merger = PdfMerger()

for pdf in ["pg1.pdf", "pg2.pdf"]:
    merger.append(pdf)

merger.write("merged_1-2.pdf")
merger.close()

# Re-create merger object
merger = PdfMerger()

for pdf in ["pg3.pdf", "pg4.pdf"]:
    merger.append(pdf)

merger.write("merged_3-4.pdf")
merger.close()

Now we extend that idea and wrap up the data so it will drive a loop that does the same thing:

page_sets = [
    # Individaul PDFs      , final merged PDF
    [["pg1.pdf", "pg2.pdf"], "merged_1-2.pdf"],
    [["pg3.pdf", "pg4.pdf"], "merged_3-4.pdf"],
]

for pdfs, final_pdf in page_sets:
    merger = PdfMerger()

    for pdf in pdfs:
        merger.append(pdf)

    merger.write(final_pdf)
    merger.close()

I get the following for either the straight-down script, or the loop-y script:

merged_1-2.pdf merged_3-4.pdf
merged_1-2.pdf merged_3-4.pdf

As best I understand your larger intent, that loop represents you writing groups of PDFs into a merged PDF (in separate directories?), and the structure of:

  1. create merger object
  2. append to merger object
  3. write merger object
  4. closer merger object
  5. Back to Step 1

works, and as far as I can tell is the way to approach this problem.

As an aside from the issue of getting the merging of the PDFs working... try creating the on-disk folder structure first, then create a data structure like page_sets that represents that on-disk structure, then (finally) pass off the data to the loop to merge. That should also make debugging easier:

  1. "Do I have the on-disk folders correct?", "Yes", then move on to
  2. "Do I have page_sets correct?", "Yes", then move on to
  3. the actual appending/writing

And if the answer to 1 or 2 is "No", you can inspect your file system or just look at a print-out of page_sets and spot any disconnects. From there, merging the PDFs should be really trivial.

Once that's working correctly, if you want to go back and refactor to try and get folders/data/merge in one pass for each set of PDFs, then go for it, but at least you have a working example to fall back on and start to ask where you missed something if you run into problems.

CodePudding user response:

Whenever you end up with something that only contains the last value from a loop, check your loop logic. In this case, your merger loop looks like this:

for i, x in enumerate(list_dir):
    ...

    if "PDFs" not in list_dir[i]:
        ...
        merger = PdfFileMerger(x)
        merger.append(x)
        merger.write(f"{new_match[i]}.pdf")
        merger.close()

So for each file in list_dir you create a new merger, add the file, and write out the PDF. Unsurprisingly, each PDF file you write contains exactly one input pdf.

Move the merger creation and merger.write out of the innermost loop, so that all of the files to be merged are appended together and written out as a single PDF. Your naming logic is a bit convoluted, but it seems that you want to be looping over the variable folder_name, and merging the corresponding files. So, maybe like this:

for key in folder_name:
    merger = PdfFileMerger()
    for x in folder_name[key]:
        merger.append(x)
    merger.write(key ".pdf") 

You'll need to add your own path and naming logic; I won't try to guess what you intended.

  • Related