I have two folders, each including ca. 100 PDF files resulting from different runs of the same PDF generation program. After performing some changes to this program, the resulting PDF should always stay equal and nothing should break the layout, the fonts, any potential graphs and so on. This is why I would like to check for visual equality while ignoring any metadata that might have changed due to running the program at different times.
My first approach was based on
Sadly if all 100 are similar by simple compare then all need text testing thus you need a fast binary test batch file to run APPROX 4,950 (99x100/2) fast tests.
test 1.pdf 2.pdf report
test 1.pdf 3.pdf report
...
test 1.pdf 100.pdf report
test 2.pdf 3.pdf report
test 2.pdf 4.pdf report
...
test 98.pdf 99.pdf report
test 98.pdf 100.pdf report
test 99.pdf 100.pdf report
then filter the similar ones out and visually inspect much lower number remaining as reported not matched.
so if 49 = 30 = 1 and 60 = 45 = 25 = 2 but not others then there is only the 1 and 2 to look at closer. Of course there will likely be more and you can use a second opinion on those too.
If you know a likely page number that changes you can exclusively test images of say 3rd page that has a date or other identifying feature.