I want to convert a pdf file's each page to a new image. To do this, i use GhostScript.Net.
The problem is i can't figure out why pageImage returns null in the System.Drawing.Image pageImage = rasterizer.GetPage(dpi, i);
line. Here is the method i use:
public static List<string> GetPDFPageText(Stream pdfStream, string dataPath)
{
try
{
int dpi = 100;
GhostscriptVersionInfo lastInstalledVersion =
GhostscriptVersionInfo.GetLastInstalledVersion(
GhostscriptLicense.GPL | GhostscriptLicense.AFPL,
GhostscriptLicense.GPL);
List<string> textParagraphs = new List<string>();
using (GhostscriptRasterizer rasterizer = new GhostscriptRasterizer())
{
rasterizer.Open(pdfStream, lastInstalledVersion,false);
for (int i = 1; i <= rasterizer.PageCount; i )
{
// here is the problem, pageImage returns null
System.Drawing.Image pageImage = rasterizer.GetPage(dpi, i);
// rest of code is unrelated to problem..
}
}
return textParagraphs;
}
catch (Exception ex)
{
throw new Exception("An error occurred.");
}
}
Function parameter Stream pdfStream
comes from the below code:
using (StreamCollection streamCollection = new StreamCollection())
{
FileStream imageStream = new FileStream(imagePath, FileMode.Open, FileAccess.Read);
// This is the parameter I used for "Stream pdfStream"
FileStream pdfStream = new FileStream(pdfPath, FileMode.Open, FileAccess.Read);
streamCollection.Streams.Add(imageStream);
streamCollection.Streams.Add(pdfStream);
PDFHelper.SavePDFByFilesTest(dataPath, streamCollection.Streams,mergedFilePath);
}
I am already comfortable with the use of StreamCollection
class because i used it before in a similar situation and it worked. I verified that the filepath is true and stream has the file correctly. Also i tried using MemoryStream
instead of FileStream
and filename
instead of stream
just to see if the problem is related to them or not. Is there any suggestion you could suggest? I would really appreciate that.
CodePudding user response:
Okay, i figured out why it didn't work. I use the latest version of Ghostscript (9.56.1) as K J mentioned (thank you for the response) and it uses a new PDF interpreter as default PDF interpreter. I assume it didn't work properly for some reason because it is a really new tool and still may have little problems for now. I added the following line to use good old PDF interpreter:
rasterizer.CustomSwitches.Add("-dNEWPDF=false");
Also defined resolution for produced image by following line:
rasterizer.CustomSwitches.Add("-r300x300");
Furthermore, i will share the structure of StreamCollection
class, I used here as reference to implement this class. Hope it helps someone.
public class StreamCollection : IDisposable
{
private bool disposedValue;
public List<Stream> Streams { get; set; }
public StreamCollection()
{
Streams = new List<Stream>();
}
protected virtual void Dispose(bool disposing)
{
if (!disposedValue)
{
if (disposing)
{
// TODO: dispose managed state (managed objects)
if (this.Streams != null && this.Streams.Count>0)
{
foreach (var stream in this.Streams)
{
if (stream != null)
stream.Dispose();
}
}
}
// TODO: free unmanaged resources (unmanaged objects) and override finalizer
// TODO: set large fields to null
disposedValue = true;
}
}
// // TODO: override finalizer only if 'Dispose(bool disposing)' has code to free unmanaged resources
// ~StreamCollection()
// {
// // Do not change this code. Put cleanup code in 'Dispose(bool disposing)' method
// Dispose(disposing: false);
// }
public void Dispose()
{
// Do not change this code. Put cleanup code in 'Dispose(bool disposing)' method
Dispose(disposing: true);
GC.SuppressFinalize(this);
}
}