I am interested in making a simple checker that receives a PDF as input and looks to see if that PDF is tagged for screen readers. This information isn't available in the metadata. Does anyone know/can point me in the right direction, if doing this is possible with JavaScript, possibly with PDF.js?
Thank you!
CodePudding user response:
There are PDFs that contain objects with tags that can be screen read without the tags and there are pdfs with tags that cannot be screen read and also correctly Tagged PDF files that fully conform to all PDF/UA or PDF/A-2 requirements.
Thus for screen reading there should be no point looking for simplistic tags or tagging other than to test the file passes muster for using a conformance checker.
From iText
If you have a document that has a picture of a fox and a dog, iText can't add any missing alt text for those images, because iText can't see that fox nor that dog.
PDF objects can be encrypted or encoded thus not always easy to detect as a simple structure, however some data must not be encoded. If you are lucky the unencoded metadata may include the string pdfua or PDF/UA, which does not prove conformance just an attempt. Also beware any tagged file that has an article about PDF/UA production but is not one :-)