Home > Blockchain >  CLI : .html as input -> execute the javascript -> save the result in a new file
CLI : .html as input -> execute the javascript -> save the result in a new file

Time:10-03

This question is about executing javascript AND HTML in the command line.

**** very important : this is NOT about scrapping. It's about "rendering" or executing the javascript contained in a private .html (not web accessible) and save the result in a new .html with the javascript executed result

CONTEXT : I have private .html (those .html have css, javascript, etc) but "unrendered"... if I see what I mean.

WHAT I WANT TO ACCOMPLISH : in a automated php script => I create a raw (unexecuted) .html => cli with the .html => execute javascript => save the result in a new .html

I TRIED : google-chrome headless -> it can save as PNG or PDF, but unable to save it as .html :(

OTHER ISSUE with google-chrome headless : the option --dump-dom saves the original .html not the executed one...

example

google-chrome 
--headless --hide-scrollbars --run-all-compositor-stages-before-draw 
--virtual-time-budget=10000 --disable-translate --disable-popup-blocking 
--disable-infobars --ignore-certificate-errors --autoplay-policy=no-user-gesture-required 
--disable-gpu --dump-dom 
"someHTML_with_CSS_AND_JAVASCRIPT.html" > final.html

Other command line solutions like CutyCapt, wkhtmltopdf, etc have the same problem : only save in pdf or jpg or png, I want to save in .html

CodePudding user response:

A web page after post-render can be stored generally in 4 other formats apart from the pre-render html/css/js components

The first two are the more familiar screen rendered image (PNG is best) or encapsulated vectors (PDF is the best and simplest)

The other two are different html format but not always built the same way. This century Firefox has problems (23 year old bug) with saving the post rendered web format but it is more native to Edge (preceded as I.E) & Chrome/ium variants.

One is to encapsulate the render and media in a html5X.zip file as used by zip.ePub3 the other is as a single mhtml(x)/.mht and the easiest way is to right click save as web page. "MHTML was proposed as an open standard, then circulated in a revised edition in 1999"

As described in the bug report, there are extensions for both FireFox and Chrome

One of web extensions that already make this possible is called "Save Page WE". It has been actively maintained. Users can save a complete webpage or selected items. Further, users can save one or more selected webpages (i.e. selected browser tabs). Information bar at top of each saved HTML file is supported as well and can be enabled among settings. Web Extension is available for Firefox - link and Chrome - link.

So saving this page as html the top left svg Stack Overflow logo will start off as

Content-Transfer-Encoding: quoted-printable
Content-Location: https://cdn.sstatic.net/Img/unified/sprites.svg?v=fcc0ea44ba27

<svg width=3D"189" height=3D"530" fill=3D"none" xmlns=3D"http://www.w3.org/=
2000/svg"><path d=3D"M48 280.8v7.6l8.5 7.6L73 281.2V273l-16.5 14.9-8.5-7.1z=
M22 324v3l4 4 7-6v-4l-7 6" fill=3D"#5EBA7D"/><path d=3D"M8 280.8v7.6l8.6 7.=
6L33 281.2V273l-16.4 14.9" fill=3D"#C9CBCF"/><path d=3D"M45 191h29l-14.4-15=
" fill=3D"#F48024"/><path d=3D"M5 191h29l-14.5-15" fill=3D"#C9CBCF"/><path =
d=3D"M59.6 243L74 228H45l14.6 15zM6.5 322.5L0 329h13" fill=3D"#F48024"/><pa=
th d=3D"M7.5 386.5L0 380v13l7.5-6.5zm47.5 87l-8-6.5v13l8-6.5zm-48.5 0L14 48=
0v-13l-7.5 6.5zm20-84L33 383H20M6.5 341.5L0 348h13M19.5 243L34 228H5M19.5 1=
20l2.9 9.2H32l-7.7 5.6 3 9.2-7.8-5.7-7.8 5.7 3-9.2-7.7-5.6h9.6" fill=3D"#C9=

The old switch for mhtml saving is --save-page-as-mhtml and was proposed several times for --headless removal !

However I could NOT get that to work in Edge. There are open issues where it does not work as expected https://bugs.chromium.org/p/chromium/issues/detail?id=624045&q=save-page-as-mhtml&can=2 thus have to suggest you need a manual emulation. e.g. using a puppeteer.

Personally for small one offs I use sendkeys from Windows. In Edge (Chrome?) keyboard scripting can save the current file,

TRY it with this page

CTRLSALTTDownDownUpEnterEnter

CodePudding user response:

@CherryDT pointed me to "chromium"

chromium --headless --dump-dom "raw.html" > "rendered.html"

it does What I needed ;)

  • Related