Home > database >  JavaScript `URL`: when to encode when setting `pathname`?
JavaScript `URL`: when to encode when setting `pathname`?

Time:02-15

When setting the pathname of a URL, when should you encode the value you are setting it to?

When I say URL I mean this API: https://developer.mozilla.org/en-US/docs/Web/API/URL

When I say "setting the pathname" I mean to do this:

url.pathname = 'some/path/to/a/resource.html';

Based on the MDN documentation, I would think the answer is "you shouldn't need to", as there is an example covering this case:

URLs are encoded according to the rules found in RFC 3986. For instance:

url.pathname = 'démonstration.html';
console.log(url.href); // "http://www.example.com/démonstration.html"

However, I have run into a case where it seems I do need to encode the value I am setting pathname to:

url.pathname = 'atest/New Folder1234/!@#$%^&*().html';
console.log(url.href);

I would expect this to output: http://example.com/atest/New Folder1234/!@#$%^&*().html

But instead I am getting: https://example.com/atest/New Folder1234/!@#$%^&*().html

It seems to get what I expect I have to do:

url.pathname = 'atest/New Folder1234/!@#$%^&*()'.split('/').map(encodeURIComponent).join('/')

What is going on here? I cannot find anything on the MDN doc page for either URL or pathname that explains this. I took quick look through RFC 3986, but that just seems to describe the URI syntax. I have run some experiments in an effort to find some sort of pattern to this problem, but nothing is standing out to me.

CodePudding user response:

See the specification for path state, in particular...

UTF-8 percent-encode c using the path percent-encode set and append the result to buffer.

with the path percent-encode set being defined as...

the query percent-encode set and U 003F (?), U 0060 (`), U 007B ({), and U 007D (}).

and the query percent-encode set being...

the C0 control percent-encode set and U 0020 SPACE, U 0022 ("), U 0023 (#), U 003C (<), and U 003E (>).

you can keep diving down the rabbit-hole if you want but I feel that's enough

Note that none of these sets include @$%^& which are the characters you pointed out.

Compare these to the specification for Encode which is much more thorough.

  • Related