As we know, there are two methods to decode base64 string in go base64.StdEncoding
or base64.RawStdEncoding
. How to use one of them correctly to decode one base64 string?
If the incorrect encoding method is invoked. For example, if RawStdEncoding
is used to decode one StdEncoding
string, the error illegal base64 data at input byte xxx
will come up.
Per doc
const (
StdPadding rune = '=' // Standard padding character
NoPadding rune = -1 // No padding
)
RawStdEncoding is the standard raw, unpadded base64 encoding, as defined in RFC 4648 section 3.2. This is the same as StdEncoding but omits padding characters.
Should we distinguish them by checking the end of padding is StdPadding
or not? code snippet
lastByte := s[len(s)-1:]
if lastByte == string(base64.StdPadding) {
base64.StdEncoding.DecodeString(s)
} else {
base64.RawStdEncoding.DecodeString(s)
}
Is that an elegant way to do that? Or anything am I missing? What is the elegant way to decode base64 string?
Update:
Maybe one raw way to do it through error checking as below
rawByte, err := base64.StdEncoding.DecodeString(s)
if err != nil {
rawByte, err = base64.RawStdEncoding.DecodeString(s)
}
CodePudding user response:
As we know, there are two methods to decode base64 string in go base64.StdEncoding or base64.RawStdEncoding.
there's also base64.URLEncoding which uses characters -
and _
as substitutes for the URL-unsafe base64 characters
and /
.
Should we distinguish them by checking the end of padding is StdPadding or not? code snippet
This won't work. There is a 1 in 3 chance that a base64 encoding will have no visible padding:
b := []byte("abc123") // len(b) % 3 == 0 - no padding
fmt.Println(base64.StdEncoding.EncodeToString(b)) // YWJjMTIz
fmt.Println(base64.RawStdEncoding.EncodeToString(b)) // YWJjMTIz
https://play.golang.org/p/LMtIHlyXdn7
so how do you tell them apart - and determine which encoding was used?
Yes you can-double decode like in your updated Question:
rawByte, err := base64.StdEncoding.DecodeString(s)
if err != nil {
rawByte, err = base64.RawStdEncoding.DecodeString(s)
}
There are some tricks you can employ to make some educated guesses. For example:
e := base64.StdEncoding.EncodeToString(b) // always produces a mutiple of 4 length
if len(e) % 4 != 0 {
// cannot be base64.StdEncoding - so try base64.RawStdEncoding?
}
CodePudding user response:
If you get illegal base64 data at input byte ...
then:
- you either used the wrong base64 decoder, or
- there's more data after the base64 string that must be stripped before invoking the decoder, or
- the input is not base64.
Should we distinguish them by checking the end of padding is StdPadding or not?
No. Just like you know that the data is at all base64-encoded, you should also know how exactly it is encoded and use e.g. either base64.StdEncoding
or base64.RawStdEncoding
, not both. You don't guess these things, but simply use the decode method that corresponds to the encoding used by the sender.
Base64 encoding can differ by:
- padded/unpadded (no
=
s at the end) - standard (
/
) or URL (-
,_
) alphabet - with/without newlines (e.g. MIME splits lines on 76 characters, PEM on 64)
You can visually inspect the encoded string to guess the encoding scheme. But note that padding is not always present - it depends on whether the length of source data is a multiple of 3 or not, since each tuple of 3 bytes is encoded as 4 6-bit characters.