I received a large number of document files, where each document has its own split archive for each page (i.e. file1.001,file1.002,file2.001,file3.001). These are meant to be TIF files that can easily be combined and converted into PDF documents.
However, some of these files will not convert through imagemagick. Some can simply be converted using a different program, which works fine. There are some files where this doesn't work. I tried converting them to .jpg, then to tif, but they won't convert to .jpg. Things got weird when I converted them to .png, as some of these files would have multiple output files associated with them.
This is hard to explain, but I'll try and give an example; file1.001 and file1.002 both have the same image present on them when converted to tif and opened. However, when either of the tif documents is converted to a .png, two .png files are created. One has the original page, but the other one has a second page of the document that I could not view previously.
What could be causing this weird behavior, and how can I convert these to pdf more reliably? I also used BlueBeam Staple to convert the files, if that helps at all. Edit: I've verified I'm on the latest imagemagick release, and I've been using it through PHP to process files. I'm running Windows 10. Also, here's some example files to play around with. The first TIF actually shows the second page, instead of the page I normally see when I open the file.
Edit 2: Sorry, I thought uploading the image would preserve the file type. Here's
CodePudding user response:
When I convert your tiff to png, I get two files using IM 7.1.0-10 Q16-HDRI or IM 6.9.12-25 Q16 both on Mac OSX Sierra.
magick -quiet 294944.tif x.png
Produces:
and
Is this not what you get or expect?
P.S.
What are the other two files: 327924.001 327924.002
If those are some kind of split tiff, then it does not look like libtiff, which Imagemagick uses to read TIFFs can handle them. I get errors when attempting to use identify
on them.
CodePudding user response:
You definitely have some issue with whatever attempted to write those tiffs.
instrument 294944 page 1 of 2 = G4 199 dpi sheet 2 of 2 294944.tif (25.17 x 17.53 inches)
instrument 294944 page 2 of 2 = G4 199 dpi sheet 1 of 2 294944.tif (24.12 x 17.63 inches)
instrument 327501 page 1 of 1 = UN 72 dpi sheet 1 of 1 327924.001 (124.78 x 93.86 inches)
instrument 327924 page 1 of 2 = G4 400 dpi sheet 1 of 2 327924.002 (23.80 x 17.53 inches)
instrument 327924 page 2 of 2 = G4 400 dpi sheet 2 of 2 327924.002 (23.84 x 17.41 inches)
Two are identified as CCITT Group 4 Fax Encoding which is common for TIFFs of this type.
Tiff is a multi image format so a multipage FAX can be viewed as one file or 4 different printing CMYK colour plates could be sent as one image file for either overlay as one check print or printed one at a time for quality inking.
The file name Tif (or tiff) is usually applied to files with one or more pages (even 400 for a long novel)
The extension part001.tif part002.tif is usually applied to groups of multiple pages OR for single sequential pages part1.001.tif part1.002.tif
Unfortunately for you you have a mix following a convention that seems to indicate number of pages 002 = 2 pages, but in inconsistent order, so need to check which were used for each file, as there is uncertainty.
Also the internal number does NOT always reflect the filename? perhaps transfer of interest ?
IN ADDITION you have a mix of compression methods and resolution thus cannot be sure of correct scale to be applied.
The best way to resolve this issue is decide how you wish them to be regrouped/sequenced and use the correct scale for each page or group of pages then recombine as desired into PDF.
It would help for a large number to tabulate the pages by number scale size compression etc and then process in identical groups before reorder and merge.