Home > database >  How to generate a Youtube ID in Go?
How to generate a Youtube ID in Go?

Time:10-23

I'm assuming all I need to do is encode 2^64 as base64 to get a 11 character Youtube identifier. I created a Go program https://play.golang.org/p/2nuA3JxVMd0

package main

import (
    "crypto/rand"
    "encoding/base64"
    "encoding/binary"
    "fmt"
    "math"
    "math/big"
    "strings"
)

func main() {

    // For example Youtube uses 11 characters of base64.
    // How many base64 characters would it require to express a 2^64 number? 2^6^x = 2^64 .. so x = 64/6 = 10.666666666 … i.e. eleven rounded up.

    // Generate a 64 bit number
    val, _ := randint64()
    fmt.Println(val)

    // Encode the 64 bit number
    b := make([]byte, 8)
    binary.LittleEndian.PutUint64(b, uint64(val))
    encoded := base64.StdEncoding.EncodeToString([]byte(b))
    fmt.Println(encoded, len(encoded))

    // https://youtu.be/gocwRvLhDf8?t=75
    ytid := strings.ReplaceAll(encoded, " ", "-")
    ytid = strings.ReplaceAll(ytid, "/", "_")
    fmt.Println("Youtube ID from 64 bit number:", ytid)

}

func randint64() (int64, error) {
    val, err := rand.Int(rand.Reader, big.NewInt(int64(math.MaxInt64)))
    if err != nil {
        return 0, err
    }
    return val.Int64(), nil
}

But it has two issues:

  1. The identifier is 12 characters instead of the expected 11
  2. The encoded base64 suffix is "=" which means that it didn't have enough to encode?

So where am I going wrong?

CodePudding user response:

tl;dr

An 8-byte int64 (no matter what value) will always encode to 11 base64 bytes followed by a single padded byte =, so you can reliably do this to get your 11 character YouTubeID:

var replacer = strings.NewReplacer(
    " ", "-",
    "/", "_",
)

ytid := replacer.Replace(encoded[:11])

or (H/T @Crowman & @Peter) one can encode without padding & without replacing and / with base64.RawURLEncoding:

//encoded := base64.StdEncoding.EncodeToString(b) // may include   or /

ytid := base64.RawURLEncoding.EncodeToString(b)  // produces URL-friendly - and _

https://play.golang.org/p/AjlvtfR7RWD


One byte (i.e. 8-bits) of Base64 output conveys 6-bits of input. So the formula to determine the number of output bytes given a certain inputs is:

out = in * 8 / 6

or

out = in * 4 / 3

With a devisor of 3 this will lead to partial use of output bytes in some cases. If the input bytes length is:

  • divisible by 3 - the final byte lands on a byte boundary
  • not divisible by 3 - the final byte is not on a byte-boundary and requires padding

In the case of 8 bytes of input:

out = 8 * 4 / 3 = 10 2/3

will utilize 10 fully utilized output base64 bytes - and one partial byte (for the 2/3) - so 11 base64 bytes plus padding to indicate how many wasted bits.

Padding is indicated via the = character and the number of = indicates the number of "wasted" bits:

waste   padding
=====   =======   
0       
1/3     =
2/3     ==

Since the output produces 10 2/3 used bytes - then 1/3 bytes were "wasted" so the padding is a single =

So base64 encoding 8 input bytes will always produce 11 base64 bytes followed by a single = padding character to produce 12 bytes in total.

CodePudding user response:

= in base64 is padding, but in 64-bit numbers, this padding is extra and does not require 12 characters, but why?

see Encoding.Encode function source:

func (enc *Encoding) Encode(dst, src []byte) {
    if len(src) == 0 {
        return
    }
    // enc is a pointer receiver, so the use of enc.encode within the hot
    // loop below means a nil check at every operation. Lift that nil check
    // outside of the loop to speed up the encoder.
    _ = enc.encode

    di, si := 0, 0
    n := (len(src) / 3) * 3
    //https://golang.org/src/encoding/base64/base64.go

in this (len(src) / 3) * 3 part , used 3 instead of 6

so output of this function always is string with even length, if your input is always 64-bit, you can delete = after encoding and add it again for decoding.

for i := 8; i <= 18; i   {
    b := make([]byte, i)
    binary.LittleEndian.PutUint64(b, uint64(0))
    encoded := base64.StdEncoding.EncodeToString(b)
    fmt.Println(encoded)
}

AAAAAAAAAAA=
AAAAAAAAAAAA
AAAAAAAAAAAAAA==
AAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAAAAAAAAAA

What do I mean by 6 (or 3)?

base64 use 64 character, each character map to one value (from 000000 to 111111)

example:

a 64bit value (uint64):

11154013587666973726

binary representation:

1001101011001011000001000100001011110000110001010011010000011110

split each six digit:

001001,101011,001011,000001,000100,001011,110000,110001,010011,010000,011110

J, r, L, B, E, L, w, x, T, Q, e

  • Related