When setting the pathname of a URL
, when should you encode the value you are setting it to?
When I say URL
I mean this API: https://developer.mozilla.org/en-US/docs/Web/API/URL
When I say "setting the pathname" I mean to do this:
url.pathname = 'some/path/to/a/resource.html';
Based on the MDN documentation, I would think the answer is "you shouldn't need to", as there is an example covering this case:
URLs are encoded according to the rules found in RFC 3986. For instance:
url.pathname = 'démonstration.html'; console.log(url.href); // "http://www.example.com/démonstration.html"
However, I have run into a case where it seems I do need to encode the value I am setting pathname
to:
url.pathname = 'atest/New Folder1234/!@#$%^&*().html';
console.log(url.href);
I would expect this to output:
http://example.com/atest/New Folder1234/!@#$%^&*().html
But instead I am getting:
https://example.com/atest/New Folder1234/!@#$%^&*().html
It seems to get what I expect I have to do:
url.pathname = 'atest/New Folder1234/!@#$%^&*()'.split('/').map(encodeURIComponent).join('/')
What is going on here? I cannot find anything on the MDN doc page for either URL
or pathname
that explains this. I took quick look through RFC 3986, but that just seems to describe the URI syntax. I have run some experiments in an effort to find some sort of pattern to this problem, but nothing is standing out to me.
CodePudding user response:
See the specification for path state, in particular...
UTF-8 percent-encode c using the path percent-encode set and append the result to buffer.
with the path percent-encode set being defined as...
the query percent-encode set and U 003F (?), U 0060 (`), U 007B ({), and U 007D (}).
and the query percent-encode set being...
the C0 control percent-encode set and U 0020 SPACE, U 0022 ("), U 0023 (#), U 003C (<), and U 003E (>).
you can keep diving down the rabbit-hole if you want but I feel that's enough
Note that none of these sets include @$%^&
which are the characters you pointed out.
Compare these to the specification for Encode which is much more thorough.