I'm trying to create a Document object with URL
for it set to a specific custom value.
I've explored some options mentioned below but was not able to do it so far.
An example approach I've tried with an explanation of where it goes wrong and what I want instead:
//run context is the browser on page http://localhost
// example URL value https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/this
const loadDocument = async (url: URL) => {
// Fetching HTML from URL and parsing it into a document
const docText = (await fetch(url)).text()
const doc = new DOMParser().parseFromString(await fetchText(url.href), 'text/html')
// the following will now be http://localhost, but I need it to be the URL the page is downloaded from
// i.e https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/this
console.log(doc.URL)
// the consequence of it being wrong is that any links to external resources using relative paths
// would be wrong as-well
console.log(doc.querySelector('link[rel="stylesheet"]').href)
// will be "http://localhost/static/css/main.1baf2b3e.chunk.css"
// instead of "https://developer.mozilla.org/static/css/main.1baf2b3e.chunk.css"
}
Main thing I'm trying to accomplish here is to have correct URLs on resources using relative links. I can try to manually fix this by walking over the DOM and changing URLs of anything that is using relative links to the original URL, but that seems error-prone. Hence - the hope to solve the problem in the root with setting the correct base URL on the document instead
Another thing I tried doing is document.implementation.createHTMLDocument()
which results in a document with about:blank
as URL, and no way to set it explicitly either =\
Trying to do an assignment to the URL property does not seem to do anything either (and it's explicitly marked as read-only in documentation)
CodePudding user response:
When you query those href properties you are actually getting their values to be relative paths that get evaluated to different absolute urls according to the context they are running. Since it would be crazy to walk through the entire dom and change that, I might suggest the opportunity to inject a base html element that will hint the basepath any relative url will be mapped to. It could quickly solve your specific problem.
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base
The <base> HTML element specifies the base URL to use for all relative URLs in a document. There can be only one <base> element in a document.
The condition must be that all those relative urls map to the same baseurl of course but that's for granted if you are grabbing the page from a working landing page url. Be warned that there are some constraints bound to CORS policies especially if the js on those pages will need to perform ajax queries to that domain. What I mean is that such solution doesn't gift you the keys to any possible scenario.
<base href="https://thereal.com/absolute/url">
Another interesting reading about the baseURI
property from MDN:
The read-only baseURI property of the Node interface returns the absolute base URL of the document containing the node.
The base URL is used to resolve relative URLs when the browser needs to obtain an absolute URL, for example when processing the HTML element's src attribute or the xlink:href or href attributes in SVG.
Although this property is read-only, its value is determined by an algorithm each time the property is accessed, and may change if the conditions changed.
The base URL is determined as follows:
- By default, the base URL is the location of the document (as determined by window.location).
- If it is an HTML Document and there is a
<base>
element in the document, the hrefvalue of the first Base element with such an
attribute is used instead.