I need to save an HTML webpage as a pdf in my google Drive. I am using app script.
However, the PDF is rendered differently from the HTLM page. I need the PDF to be the same as the HTML.
Here is a print screen of the HTML
and here is my code:
function downloadFile() {
var fileURL = "http....."
var folder = "primes"
var fileName = "";
var fileSize = "";
var response = UrlFetchApp.fetch(fileURL, {
headers: { Authorization: 'Bearer ' ScriptApp.getOAuthToken() },
})
var htmlBody = response.getContentText();
var rc = response.getResponseCode();
if (rc == 200) {
var blob = Utilities.newBlob(htmlBody,
MimeType.HTML).getAs('application/pdf').setName('Nota.pdf');
var folder = DriveApp.getFolderById("1gBA8YCs3PH7v7CNl3nlsjNqYzhOxhjYa");
if(folder != null){
var file = folder.createFile(blob);
fileName = file.getName()
fileSize = file.getSize()
}
}
var fileInfo = {'rc':rc,"filename": fileName, "fileSize":fileSize}
Logger.log(fileInfo)
}
!!!!UPDATE!!!!!!
Here is the HTML for the web page:
I know it is possible to save this PDF correctly, because if a use a chrome extension called HTML TO PDF, it converts correctly as show in the following figure
CodePudding user response:
The best way to do this is via puppeteer. Puppeteer is a nodejs library which can launch a headless webbrowser.
const puppeteer = require('puppeteer')
// I found these sizes work best when outputting to A4 format
const browser = await puppeteer.launch({
headless: true,
args: [`--window-size=794,1123`],
defaultViewport: {
width: 980,
height: 1386
}
});
const page = await browser.newPage();
// Convert the html string to a raw string to keep any special characters as plain text
const doc = String.raw`${YOUR STRING}`;
// I found that it works better if you first search a page and then set the content.
await page.goto("https://google.com")
// set the html content of the page
await page.setContent(doc);
let file = “File.pdf”`;
let path = “output path”;
// saves a pdf to the provided path
const pdf = await page.pdf({
path: path,
scale: 0.8,
displayHeaderFooter: true,
printBackground: true,
margin: {
top: 80,
bottom: 80,
left: 30,
right: 30
}
});
await browser.close();
Pro: Gives a really clean output with selectable text Con: You have to run this code on a nodejs server for the best result
For more documentation on puppeteer view
--headless --disable-gpu --enable-logging --print-to-pdf="%UserProfile%\Documents\Demofile.pdf" --no-margins --disable-extensions --print-to-pdf-no-header --disable-popup-blocking --run-all-compositor-stages-before-draw --disable-checker-imaging https://bling.com.br/doc.view.php?id=2565744b051b9f1d3177c30ccf1e65b2
PROs Exactly as HTML is rendered in a browser
CONs Exactly as HTML pages would be saved as PORTRAIT PDF from a browser (command line default is landscape=false) so to change layout you need another method.