Home > OS >  Why does math with ULong > 16 digits get wonky?
Why does math with ULong > 16 digits get wonky?

Time:07-05

I am working on a "simple" base converter to convert ULong with base 10 to a String with any base. Here I use 64 chars. Usecase is to shorten ULong that are stored as String anyway.

Public Class BaseConverter
    'base64, but any length would work
    Private Shared ReadOnly Characters() As Char = {"0"c, "1"c, "2"c, "3"c, "4"c, "5"c, "6"c, "7"c, "8"c, "9"c,
                                                    "a"c, "b"c, "c"c, "d"c, "e"c, "f"c, "g"c, "h"c, "i"c, "j"c, "k"c, "l"c, "m"c, "n"c, "o"c, "p"c, "q"c, "r"c, "s"c, "t"c, "u"c, "v"c, "w"c, "x"c, "y"c, "z"c,
                                                    "A"c, "B"c, "C"c, "D"c, "E"c, "F"c, "G"c, "H"c, "I"c, "J"c, "K"c, "L"c, "M"c, "N"c, "O"c, "P"c, "Q"c, "R"c, "S"c, "T"c, "U"c, "V"c, "W"c, "X"c, "Y"c, "Z"c,
                                                    " "c, "-"c}

    Public Shared Function Encode(number As ULong) As String
        Dim buffer = New Text.StringBuilder()
        Dim quotient = number
        Dim remainder As ULong
        Dim base = Convert.ToUInt64(Characters.LongLength)

        Do
            remainder = quotient Mod base
            quotient = quotient \ base
            buffer.Insert(0, Characters(remainder).ToString())
        Loop While quotient <> 0

        Return buffer.ToString()
    End Function

    Public Shared Function Decode(str As String) As ULong
        If String.IsNullOrWhiteSpace(str) Then Return 0

        Dim result As ULong = 0
        Dim base = Convert.ToUInt64(Characters.LongLength)
        Dim nPos As ULong = 0

        For i As Integer = str.Length - 1 To 0 Step - 1
            Dim cPos As Integer = Array.IndexOf(Of Char)(Characters, str(i))
            result  = (base ^ nPos) * Convert.ToUInt64(cPos)
            nPos  = 1
        Next
        Return result
    End Function
End Class

It works fine with numbers up to 16 digits, but when the number has more digits it starts to round weird.

17 digits result in jumps of 8, 18 digits result in jumps of 32 and 19 digits result in jumps of 256.

For example, these numbers around 42347959784570944 do not work and result in those blocks:

42347959784570939 > 2msPWXX00X > 42347959784570936 ERROR
42347959784570940 > 2msPWXX00Y > 42347959784570944 ERROR
42347959784570941 > 2msPWXX00Z > 42347959784570944 ERROR
42347959784570942 > 2msPWXX00  > 42347959784570944 ERROR
42347959784570943 > 2msPWXX00- > 42347959784570944 ERROR
42347959784570944 > 2msPWXX010 > 42347959784570944
42347959784570945 > 2msPWXX011 > 42347959784570944 ERROR
42347959784570946 > 2msPWXX012 > 42347959784570944 ERROR
42347959784570947 > 2msPWXX013 > 42347959784570944 ERROR
42347959784570948 > 2msPWXX014 > 42347959784570944 ERROR
42347959784570949 > 2msPWXX015 > 42347959784570952 ERROR

The problem has to be in the Decode() function, as the generated strings differ.

I put some tests on https://dotnetfiddle.net/7wAGLh, but I just can't find the issue.

CodePudding user response:

The exponent operator, ^, always returns a Double.
This means the entire expression (base ^ nPos) * Convert.ToUInt64(cPos) is evaluated and returned as Double (and then silently crammed into a ULong).

Which brings in the Double imprecision that you are observing.

Using Option Strict On at all times is a good way to catch these errors.

Provided that you know that (base ^ nPos) will not exceed the maximum value of ULong (which seems to be the assumption anyway), the fix is

result  = CULng(base ^ nPos) * Convert.ToUInt64(cPos)
  • Related