TL;DR: Looking for a python library to create a PDF template with specific styling and fill it with information from JSON file
Full Context:
I have a long RPA pipeline that ends with 500 Json documents. Each JSON document represents an exam, each exam might have 1000-4000 Questions. The JSON file is simple, an example of that:
{
"AllQuestions": [
{
"QuestionText": "A 3-year old man did.....",
"Choices": ["Choice A", "Choice B", etc.]
}, another question, etc. ]
}
The only variable here is that sometimes I can have 5 Choices or 4 Choices and sometimes I have an image in the exam (However, I can handle those specs once I know what to use).
Well I have to create a style that's similar to this one:
"Without Key Info, attending labs, etc."
Now, I looked into PyPDF2 and FPDF, and best what I could reach is this style:
Now, for FPDF2, it is pretty straight forward, in just a few lines of codes, I could create that by initializing and class and adding page and adding the question to it. However, the styling there is very limited and I tried to make use of "WriteHTML" and it still can't reach my desired styling at all.
I read that PDFKit or other alternatives are good, do you think I should first create a full HTML document with 1000 questions then take that into PDFKit and convert it into PDF? or is there a way to treat each question as an object with default styling and append it to a PDF file object?
Thanks in advance :)
CodePudding user response:
I don't know about the most Pythonic way, but I would do it like so:
- Figure out what language you want to define the final output in. Since it has a lot of complex formatting, I'd say you want HTML (probably with CSS) or Latex.
- Write a Jinja template in this target language, with variables in the appropriate places.
- Plug the values from your JSON into Jinja to render the template and construct the HTML/Latex of every question.
- Use pandoc to convert the HTML to PDF.
While this is quite a few technologies, they are all well suited to their task and easier to work with. The problem here is that you want to build PDFs with very specific layout. However PDFs are very complex and not all libraries implement it well - but pandoc does.