Home > other >  decoding base64 encoded text with POSIX awk
decoding base64 encoded text with POSIX awk

Time:01-07

In a bash script that I'm writing for Linux/BSD/Solaris I need to decode more than a hundred thousand base64-encoded text strings, and, because I don't wanna massively fork a non-portable base64 binary from awk, I wrote a function that does the decoding.

Here's the code of my base64_decode function:

function base64_decode(str,    out,i,n,v) {
    out = ""
    if ( ! ("A" in _BASE64_DECODE_c2i) )
        for (i = 1; i <= 64; i  )
            _BASE64_DECODE_c2i[substr("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 /",i,1)] = i-1
    i = 0
    n = length(str)
    while (i <= n) {
        v = _BASE64_DECODE_c2i[substr(str,  i,1)] * 262144   \
            _BASE64_DECODE_c2i[substr(str,  i,1)] * 4096   \
            _BASE64_DECODE_c2i[substr(str,  i,1)] * 64   \
            _BASE64_DECODE_c2i[substr(str,  i,1)]
        out = out sprintf("%c%c%c", int(v/65536), int(v/256), v)
    }
    return out
}

SAMPLE EXAMPLE

Let's say that I ran the following command:

ldapsearch -LLL -o ldif-wrap=no -b dc=example,dc=com -h 1.2.3.4 '(|(uid=*)(GroupCode='*'))' uid givenName sn GroupCode memberUid
dn: uid=jsmith,ou=users,dc=exmaple,dc=com
givenName: John
sn: SMITH
uid: jsmith

dn: uid=jdoe,ou=users,dc=exmaple,dc=com
uid: jdoe
givenName:: SmFuZQ==
sn:: RE9F

dn: cn=group1,ou=groups,dc=example,dc=com
GroupCode: 025496
memberUid:: amRvZQ==
memberUid: jsmith

Now I want to output the givenName of the users that are members of the group whose GroupCode is 025496. Here would be an awk for doing so:

LANG=C command -p awk -F '\n' -v RS='' -v GroupCode=025496 '
    {
        delete attrs
        for (i = 2; i <= NF; i  ) {
            match($i,/::? /)
            key = substr($i,1,RSTART-1)
            val = substr($i,RSTART RLENGTH)
            if (RLENGTH == 3)
                val = base64_decode(val)
            attrs[key] = ((key in attrs) ? attrs[key] SUBSEP val : val)
        }
        if ( /\nuid:/ )
            givenName[ attrs["uid"] ] = attrs["givenName"]
        else
            memberUid[ attrs["GroupCode"] ] = attrs["memberUid"]
    }
    END {
        n = split(memberUid[GroupCode],uid,SUBSEP)
        for ( i = 1; i <= n; i   )
            print givenName[ uid[i] ]
    }

    function base64_decode(str,    out,i,n,v) { ... }
'

On BSD and Solaris the output is:

Jane
John

On Linux (GNU awk) the output is:


John

What I am I doing wrong?

CodePudding user response:

Your function generates NUL bytes when its argument (encoded string) ends with padding characters (=s). Below is a corrected version of your while loop:

while (i < n) {
    v = _BASE64_DECODE_c2i[substr(str,1 i,1)] * 262144   \
        _BASE64_DECODE_c2i[substr(str,2 i,1)] * 4096   \
        _BASE64_DECODE_c2i[substr(str,3 i,1)] * 64   \
        _BASE64_DECODE_c2i[substr(str,4 i,1)]
    i  = 4
    if (v%6 != 0)
        out = out sprintf("%c%c%c", v/65536, v/256, v)
    else if (v/256%6 != 0)
        out = out sprintf("%c%c", v/65536, v/256)
    else
        out = out sprintf("%c", v/65536)
}

CodePudding user response:

Problem is within base64_decode function that outputs some junk characters on gnu-awk.

You can use this awk code that uses system provided base64 utility as an alternative:

{
   delete attrs
   for (i = 2; i <= NF; i  ) {
      match($i,/::? /)
      key = substr($i,1,RSTART-1)
      val = substr($i,RSTART RLENGTH)
      if (RLENGTH == 3) {
         cmd = "echo " val " | base64 -di"
         cmd | getline val   # should also check exit code here
      }
      attrs[key] = ((key in attrs) ? attrs[key] SUBSEP val : val)
   }
   if ( /\nuid:/ )
      givenName[ attrs["uid"] ] = attrs["givenName"]
   else
      memberUid[ attrs["GroupCode"] ] = attrs["memberUid"]
}
END {
   n = split(memberUid[GroupCode],uid,SUBSEP)
   for ( i = 1; i <= n; i   )
      print givenName[ uid[i] ]
}

I have tested this on gnu and BSD awk versions and I am getting expected output in all the cases.

If you cannot use external base64 utility then I suggest you take a look here for awk version of base64 decode.

  • Related