Home > Net >  Uri.IdnHost strange behaviour for long hosts
Uri.IdnHost strange behaviour for long hosts

Time:11-09

Uri.IdnHost does not work for long hosts. Is this normal behavior? How can this be fixed?

void checkIDN(string urlString)
{
    var uri = new Uri(urlString);
    Console.WriteLine($"{uri.Host,-40} ---> {uri.IdnHost}");
}
checkIDN("https://единая-дата-объединения-застройщиков.рф");
checkIDN("https://единая-дата-объединения-застройщико.рф");
checkIDN("https://единая-дата-объединения-застройщик.рф");
checkIDN("https://единая-дата-объединения-застройщи.рф");
checkIDN("https://единая-дата-объединения-застройщ.рф");
checkIDN("https://единая-дата-объединения-застрой.рф");
checkIDN("https://единая-дата-объединения-застро.рф");

CodePudding user response:

For reference, your code produces there following results:

единая-дата-объединения-застройщиков.рф ---> единая-дата-объединения-застройщиков.рф

единая-дата-объединения-застройщико.рф ---> единая-дата-объединения-застройщико.рф

единая-дата-объединения-застройщик.рф ---> единая-дата-объединения-застройщик.рф

единая-дата-объединения-застройщи.рф ---> единая-дата-объединения-застройщи.рф

единая-дата-объединения-застройщ.рф ---> единая-дата-объединения-застройщ.рф

единая-дата-объединения-застрой.рф ---> xn------5cdbacgvcedib5aejbw8dkbql3czamp4tlfqa.xn--p1ai

единая-дата-объединения-застро.рф ---> xn------5cdbacgvcedib5aejb7fkbpl1cyalp6s6eqa.xn--p1ai

Note that as soon as part before ".рф" in host name exceeds 31 characters - it stops working as expected.

Domain name according to RFC can contain up to 256 characters, and each label (labels are separated by dots) can contain up to 63 characters. For IDN names though we should not count characters directly but instead count characters in already converted name.

However, according to this code I found in .NET source, if label contains unicode characters - it just counts each unicode character as 2:

count  ;
if (*newPos > 0xFF)
     count  ; // counts for two octets

And then it makes a check:

if (... (labelHasUnicode ? count   4 : count) > 63 ...)
{
    return false;
}

So long story short - it does not consider your domain name a valid one, because it thinks label length requirement is violated. And if it's not a valid domain name - no need to convert it to IDN. You might verify it doesn't consider it a valid domain name by running:

Uri.CheckHostName("единая-дата-объединения-застройщиков.рф");

This returns "Unknown" while for valid domain name it would return "Dns".

I might miss something, but I don't agree with this, because if we convert a label "единая-дата-объединения-застройщиков", we will get this string:

XN------5CDBACGLLCEEIB7AFJBDUX5CKBTLD7C3AQP8USA0OQA

It contains 51 character so does not violate the requirements, and so the whole thing is a valid domain name.

I'd file a bug in .NET bug tracker describing the issue, maybe they can clarify better.

As a workaround, you can use IdnMapping class:

static void checkIDN(string urlString) {
    var uri = new Uri(urlString);
    var m = new IdnMapping();
    var idn = m.GetAscii(uri.Host);
    Console.WriteLine($"{uri.Host,-40} ---> {idn}");
}

This class doesn't make any checks and just converts string to IDN.

  • Related