Save PDF with memory stream in a list using iTextSharp-CodePudding

I'd like to read an multipage pdf file from the file system and split it to separated pages. The splitted pages i like to save to an list object. The page in the list i want to save in a base64 encoded structure in a xml and send it over a rest api service.

What i have already done:

Read the pdf file
Split the pdf file to pages
Save the pages to a list

What's not relevant and only for completeness of my problem:

Send the pages with a rest api service waqsy( current no code implemented)
Xml structure

What's my problem? If i save the pages from the byte array list to the file system and open the pdf (with only one page of the original page) the the pdf file is defect and can't open.

Which error had i made on this process?

What version of iTextSharp i use? 5.4.2

public List<byte[]> SplitAndSaveWithMemoryStream(string inputFilePath)
{
   // create the list
   List<byte[]> pages = new List<byte[]>();

    // get the input pdf
    using (PdfReader reader = new PdfReader(inputFilePath))
    {
        // iterate over the pages
        for (int pagenumber = 1; pagenumber <= reader.NumberOfPages; pagenumber  )
        {
             Document document = new Document();
             MemoryStream memoryStream = new MemoryStream();
            
             PdfCopy copy = new PdfCopy(document, memoryStream);

             document.Open();

             // copy the n-th page to the memory stream
             copy.AddPage(copy.GetImportedPage(reader, pagenumber));

             memoryStream.Position = 0;

             // save the page from the memory stream to the list
             pages.Add(memoryStream.ToArray());

             // -- TEST ---
             // save the file, only for test
             // string fullfilePath = "C://temp/temp.pdf";
             // File.WriteAllBytes(fullfilePath, pages[pagenumber]);

             document.Close();
      }

      return pages;
}

Xml structure:

<document>
  <page>
    <number>1</number>
    <content>dZwmxjwiduasmma...</content>
  </page>
  <page>
    <number>2</number>
    <content>ddiw92kjd2002jd929...</content>
  </page>
  <page>
    <number>3</number>
    <content>d82kdjEiuwowpdo...</content>
  </page>
</document>

Solution with the answer by 'mkl'

public List<byte[]> SplitAndSaveWithMemoryStream(string inputFilePath)
{
   // create the list
   List<byte[]> pages = new List<byte[]>();

    // get the input pdf
    using (PdfReader reader = new PdfReader(inputFilePath))
    {
        // iterate over the pages
        for (int pagenumber = 1; pagenumber <= reader.NumberOfPages; pagenumber  )
        {
             Document document = new Document();
             MemoryStream memoryStream = new MemoryStream();
            
             PdfCopy copy = new PdfCopy(document, memoryStream);

             document.Open();

             // copy the n-th page to the memory stream
             copy.AddPage(copy.GetImportedPage(reader, pagenumber));

             document.Close();

             // save the page from the memory stream to the list
             pages.Add(memoryStream.ToArray());

             // -- TEST ---
             // save the file, only for test
             // string fullfilePath = "C://temp/temp.pdf";
             // File.WriteAllBytes(fullfilePath, pages[pagenumber]);
      }

      return pages;
}

CodePudding user response：

The PDF in the MemoryStream is not finished before document is closed. Thus, you store incomplete PDFs.

To fix this, move

document.Close();

up right after

copy.AddPage(...);

As an aside, if I recall correctly, you don't need memoryStream.Position = 0 before memoryStream.ToArray() as ToArray always takes the full content of the memory stream.