Home > OS >  ITEXT and PDFBOX is not detecting all the form fields present in the pdf
ITEXT and PDFBOX is not detecting all the form fields present in the pdf

Time:04-02

In this code I've used for finding the number of fields in the pdf using Itext and PDFBOX with Java, I'm attaching the pdf, it has 11 fields but the fields present in the page 1 are not getting detected and the size being printed is 2 for the cases.

        PdfDocument doc = new PdfDocument(new PdfReader(file));
        PdfAcroForm form = PdfAcroForm.getAcroForm(doc, true);
        System.out.println("form fields size from Itext:" form.getFormFields().size());


        PDDocument document = PDDocument.load(file);
        PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
        List<PDField> fields = acroForm.getFields();
        System.out.println("form fields size from PDFBOX:" fields.size());

PDF FILE HERE IN THIS LINK

CodePudding user response:

The form information in your PDF is inconsistent.

The global AcroForm form definition in your PDF contains only 2 fields, Text Field 6 and Text Field 7, which happen to be the two fields on page two.

Page one in its Annots array references ten form field widgets, each of them merged with a form field object. These fields are not referenced from the AcroForm form definition in your PDF. Thus, they are not part of the form of the PDF but merely some arbitrary annotations hanging around.

To fix the issue, simply reference the form fields of the widget annotations of page one from the AcroForm form definition.

  • Related