C# iText7 - 'Trailer Not Found' when using PdfReader with PDF string from database-CodePudding

I'm saving the contents of my PDF file (pdfAsString) to the database.
File is of type IFormFile (file uploaded by the user).

string pdfAsString;
using (var reader = new StreamReader(indexModel.UploadModel.File.OpenReadStream()))
{
    pdfAsString = await reader.ReadToEndAsync();
    // pdfAsString = ; // encoding function or lack thereof
}

Later I'm trying to fetch and use these contents to initialize a new instance of MemoryStream, then using that to create a PdfReader and then using that to create a PdfDocument, but at this point I get the 'Trailer not found' exception. I have verified that the Trailer part of the PDF is present inside the contents of the string that I use to create the MemoryStream. I have also made sure the position is set to the beginning of the file.

The issue seems related to the format of the PDF contents fetched from the database. iText7 doesn't seem able to navigate through it other than the beginning of the file.

I'm expecting to be able to create an instance of PdfDocument with the contents of the PDF saved to my database.

Note 1: Using the Stream created from OpenReadStream() works when trying to create a PdfReader and then PdfDocument, but I don't have access to that IFormFile when reading from the DB, so this doesn't help me in my use case.

Note 2: If I use the PDF from my device by giving a path, it works correctly, same for using a FileStream created from a path. However, this doesn't help my use case.

So far, I've tried saving it raw and then using that right out of the gate (1) or encoding special symbols like \n \t to ASCII hexadecimal notation (2). I've also tried HttpUtility.UrlEncode on save and UrlDecode after getting the database record (3), and also tried ToBase64String on save and FromBase64String on get (4).

// var pdfContent = databaseString; // 1
// var pdfContent = databaseString.EncodeSpecialCharacters(); // encode special symbols // 2
// var pdfContent = HttpUtility.UrlDecode(databaseString); // decode urlencoded string // 3
var pdfContent = Convert.FromBase64String(databaseString); // decode base64 // 4

using (var stream = new MemoryStream(pdfContent))
{
    PdfReader pdfReader = new PdfReader(stream).SetUnethicalReading(true);
    PdfWriter pdfWriter = new PdfWriter("new-file.pdf");
    PdfDocument pdf = new PdfDocument(pdfReader, pdfWriter); // exception here :(
    
    // some business logic...
}

Any help would be appreciated.

EDIT: on a separate project, I'm trying to run this code:

using (var stream = File.OpenRead("C:\\<path>\\<filename>.pdf"))
            {
                var formFile = new FormFile(stream, 0, stream.Length, null, "<filename>.pdf");
                var reader = new StreamReader(formFile.OpenReadStream());
                var pdfAsString = reader.ReadToEnd();
                var pdfAsBytes = Encoding.UTF8.GetBytes(pdfAsString);

                using (var newStream = new MemoryStream(pdfAsBytes))
                {
                    newStream.Seek(0, SeekOrigin.Begin);
                    var pdfReader = new PdfReader(newStream).SetUnethicalReading(true);
                    var pdfWriter = new PdfWriter("Test-PDF-1.pdf");

                    PdfDocument pdf = new PdfDocument(pdfReader, pdfWriter);

                    PdfAcroForm form = PdfAcroForm.GetAcroForm(pdf, true);
                    IDictionary<string, PdfFormField> fields = form.GetFormFields();
                    foreach (var field in fields)
                    {
                        field.Value.SetValue(field.Key);
                    }
                    //form.FlattenFields();
                    pdf.Close();
                }
            }

and if I replace "newStream" inside of PdfReader with formFile.OpenReadStream() it works fine, otherwise I get the 'Trailer not found' exception.

CodePudding user response：

Answer: use BinaryReader and ReadBytes instead of StreamReader when initially trying to read the data. Example below:

using (var stream = File.OpenRead("C:\\<filepath>\\<filename>.pdf"))
            {
                // FormFile - my starting point inside of the web application
                var formFile = new FormFile(stream, 0, stream.Length, null, "<filename>.pdf");
                var reader = new BinaryReader(formFile.OpenReadStream());
                var pdfAsBytes = reader.ReadBytes((int)formFile.Length); // store this in the database

                using (var newStream = new MemoryStream(pdfAsBytes))
                {
                    newStream.Seek(0, SeekOrigin.Begin);
                    var pdfReader = new PdfReader(newStream).SetUnethicalReading(true);
                    var pdfWriter = new PdfWriter("Test-PDF-1.pdf");

                    PdfDocument pdf = new PdfDocument(pdfReader, pdfWriter);

                    PdfAcroForm form = PdfAcroForm.GetAcroForm(pdf, true);
                    IDictionary<string, PdfFormField> fields = form.GetFormFields();
                    foreach (var field in fields)
                    {
                        field.Value.SetValue(field.Key);
                    }
                    //form.FlattenFields();
                    pdf.Close();
                }
            }