I'd like to read an multipage pdf file from the file system and split it to separated pages. The splitted pages i like to save to an list object. The page in the list i want to save in a base64 encoded structure in a xml and send it over a rest api service.
What i have already done:
- Read the pdf file
- Split the pdf file to pages
- Save the pages to a list
What's not relevant and only for completeness of my problem:
- Send the pages with a rest api service waqsy( current no code implemented)
- Xml structure
What's my problem? If i save the pages from the byte array list to the file system and open the pdf (with only one page of the original page) the the pdf file is defect and can't open.
Which error had i made on this process?
What version of iTextSharp i use? 5.4.2
public List<byte[]> SplitAndSaveWithMemoryStream(string inputFilePath)
{
// create the list
List<byte[]> pages = new List<byte[]>();
// get the input pdf
using (PdfReader reader = new PdfReader(inputFilePath))
{
// iterate over the pages
for (int pagenumber = 1; pagenumber <= reader.NumberOfPages; pagenumber )
{
Document document = new Document();
MemoryStream memoryStream = new MemoryStream();
PdfCopy copy = new PdfCopy(document, memoryStream);
document.Open();
// copy the n-th page to the memory stream
copy.AddPage(copy.GetImportedPage(reader, pagenumber));
memoryStream.Position = 0;
// save the page from the memory stream to the list
pages.Add(memoryStream.ToArray());
// -- TEST ---
// save the file, only for test
// string fullfilePath = "C://temp/temp.pdf";
// File.WriteAllBytes(fullfilePath, pages[pagenumber]);
document.Close();
}
return pages;
}
Xml structure:
<document>
<page>
<number>1</number>
<content>dZwmxjwiduasmma...</content>
</page>
<page>
<number>2</number>
<content>ddiw92kjd2002jd929...</content>
</page>
<page>
<number>3</number>
<content>d82kdjEiuwowpdo...</content>
</page>
</document>
Solution with the answer by 'mkl'
public List<byte[]> SplitAndSaveWithMemoryStream(string inputFilePath)
{
// create the list
List<byte[]> pages = new List<byte[]>();
// get the input pdf
using (PdfReader reader = new PdfReader(inputFilePath))
{
// iterate over the pages
for (int pagenumber = 1; pagenumber <= reader.NumberOfPages; pagenumber )
{
Document document = new Document();
MemoryStream memoryStream = new MemoryStream();
PdfCopy copy = new PdfCopy(document, memoryStream);
document.Open();
// copy the n-th page to the memory stream
copy.AddPage(copy.GetImportedPage(reader, pagenumber));
document.Close();
// save the page from the memory stream to the list
pages.Add(memoryStream.ToArray());
// -- TEST ---
// save the file, only for test
// string fullfilePath = "C://temp/temp.pdf";
// File.WriteAllBytes(fullfilePath, pages[pagenumber]);
}
return pages;
}
CodePudding user response:
The PDF in the MemoryStream
is not finished before document
is closed. Thus, you store incomplete PDFs.
To fix this, move
document.Close();
up right after
copy.AddPage(...);
As an aside, if I recall correctly, you don't need memoryStream.Position = 0
before memoryStream.ToArray()
as ToArray
always takes the full content of the memory stream.