Home > Mobile >  cannot convert pdf page to image
cannot convert pdf page to image

Time:05-29

I want to convert a pdf file's each page to a new image. To do this, i use GhostScript.Net. The problem is i can't figure out why pageImage returns null in the System.Drawing.Image pageImage = rasterizer.GetPage(dpi, i); line. Here is the method i use:

 public static List<string> GetPDFPageText(Stream pdfStream, string dataPath)
    {

        try
        {
            int dpi = 100;
            GhostscriptVersionInfo lastInstalledVersion =
           GhostscriptVersionInfo.GetLastInstalledVersion(
                   GhostscriptLicense.GPL | GhostscriptLicense.AFPL,
                   GhostscriptLicense.GPL);
            List<string> textParagraphs = new List<string>();

            using (GhostscriptRasterizer rasterizer = new GhostscriptRasterizer())
            {
                rasterizer.Open(pdfStream, lastInstalledVersion,false);

                for (int i = 1; i <= rasterizer.PageCount; i  )
                {
                    // here is the problem, pageImage returns null
                    System.Drawing.Image pageImage = rasterizer.GetPage(dpi, i);

                    // rest of code is unrelated to problem..
                    
                }
            }

            return textParagraphs;
        }
        catch (Exception ex)
        {
            throw new Exception("An error occurred.");
        }
        
    }

Function parameter Stream pdfStream comes from the below code:

            using (StreamCollection streamCollection = new StreamCollection())
            {
                FileStream imageStream = new FileStream(imagePath, FileMode.Open, FileAccess.Read);
                // This is the parameter I used for "Stream pdfStream"
                FileStream pdfStream = new FileStream(pdfPath, FileMode.Open, FileAccess.Read);
                streamCollection.Streams.Add(imageStream);
                streamCollection.Streams.Add(pdfStream);
                PDFHelper.SavePDFByFilesTest(dataPath, streamCollection.Streams,mergedFilePath);
            }

I am already comfortable with the use of StreamCollection class because i used it before in a similar situation and it worked. I verified that the filepath is true and stream has the file correctly. Also i tried using MemoryStream instead of FileStream and filename instead of stream just to see if the problem is related to them or not. Is there any suggestion you could suggest? I would really appreciate that.

CodePudding user response:

Okay, i figured out why it didn't work. I use the latest version of Ghostscript (9.56.1) as K J mentioned (thank you for the response) and it uses a new PDF interpreter as default PDF interpreter. I assume it didn't work properly for some reason because it is a really new tool and still may have little problems for now. I added the following line to use good old PDF interpreter:

rasterizer.CustomSwitches.Add("-dNEWPDF=false");

Also defined resolution for produced image by following line:

rasterizer.CustomSwitches.Add("-r300x300");

Furthermore, i will share the structure of StreamCollection class, I used here as reference to implement this class. Hope it helps someone.

public class StreamCollection :  IDisposable
    {
        private bool disposedValue;
        
        public List<Stream> Streams { get; set; }

        public StreamCollection()
        {
            Streams = new List<Stream>();
        }
        
        protected virtual void Dispose(bool disposing)
        {
            if (!disposedValue)
            {
                if (disposing)
                {
                    // TODO: dispose managed state (managed objects)
                    if (this.Streams != null && this.Streams.Count>0)
                    {
                        foreach (var stream in this.Streams)
                        {
                            if (stream != null)
                                stream.Dispose();
                        }
                    }
                }

                // TODO: free unmanaged resources (unmanaged objects) and override finalizer
                // TODO: set large fields to null
                disposedValue = true;
            }
        }

        // // TODO: override finalizer only if 'Dispose(bool disposing)' has code to free unmanaged resources
        // ~StreamCollection()
        // {
        //     // Do not change this code. Put cleanup code in 'Dispose(bool disposing)' method
        //     Dispose(disposing: false);
        // }

        public void Dispose()
        {
            // Do not change this code. Put cleanup code in 'Dispose(bool disposing)' method
            Dispose(disposing: true);
            GC.SuppressFinalize(this);
        }
    }
  • Related