In a bash
script that I'm writing for Linux/BSD/Solaris I need to decode more than a hundred thousand base64-encoded text strings, and, because I don't wanna massively fork a non-portable base64
binary from awk
, I wrote a function that does the decoding.
Here's the code of my base64_decode
function:
function base64_decode(str, out,i,n,v) {
out = ""
if ( ! ("A" in _BASE64_DECODE_c2i) )
for (i = 1; i <= 64; i )
_BASE64_DECODE_c2i[substr("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 /",i,1)] = i-1
i = 0
n = length(str)
while (i <= n) {
v = _BASE64_DECODE_c2i[substr(str, i,1)] * 262144 \
_BASE64_DECODE_c2i[substr(str, i,1)] * 4096 \
_BASE64_DECODE_c2i[substr(str, i,1)] * 64 \
_BASE64_DECODE_c2i[substr(str, i,1)]
out = out sprintf("%c%c%c", int(v/65536), int(v/256), v)
}
return out
}
SAMPLE EXAMPLE
Let's say that I ran the following command:
ldapsearch -LLL -o ldif-wrap=no -b dc=example,dc=com -h 1.2.3.4 '(|(uid=*)(GroupCode='*'))' uid givenName sn GroupCode memberUid
dn: uid=jsmith,ou=users,dc=exmaple,dc=com
givenName: John
sn: SMITH
uid: jsmith
dn: uid=jdoe,ou=users,dc=exmaple,dc=com
uid: jdoe
givenName:: SmFuZQ==
sn:: RE9F
dn: cn=group1,ou=groups,dc=example,dc=com
GroupCode: 025496
memberUid:: amRvZQ==
memberUid: jsmith
Now I want to output the givenName
of the users that are members of the group whose GroupCode
is 025496
. Here would be an awk
for doing so:
LANG=C command -p awk -F '\n' -v RS='' -v GroupCode=025496 '
{
delete attrs
for (i = 2; i <= NF; i ) {
match($i,/::? /)
key = substr($i,1,RSTART-1)
val = substr($i,RSTART RLENGTH)
if (RLENGTH == 3)
val = base64_decode(val)
attrs[key] = ((key in attrs) ? attrs[key] SUBSEP val : val)
}
if ( /\nuid:/ )
givenName[ attrs["uid"] ] = attrs["givenName"]
else
memberUid[ attrs["GroupCode"] ] = attrs["memberUid"]
}
END {
n = split(memberUid[GroupCode],uid,SUBSEP)
for ( i = 1; i <= n; i )
print givenName[ uid[i] ]
}
function base64_decode(str, out,i,n,v) { ... }
'
On BSD and Solaris the output is:
Jane
John
On Linux (GNU awk
) the output is:
John
What I am I doing wrong?
CodePudding user response:
Your function generates NUL bytes when its argument (encoded string) ends with padding characters (=
s). Below is a corrected version of your while
loop:
while (i < n) {
v = _BASE64_DECODE_c2i[substr(str,1 i,1)] * 262144 \
_BASE64_DECODE_c2i[substr(str,2 i,1)] * 4096 \
_BASE64_DECODE_c2i[substr(str,3 i,1)] * 64 \
_BASE64_DECODE_c2i[substr(str,4 i,1)]
i = 4
if (v%6 != 0)
out = out sprintf("%c%c%c", v/65536, v/256, v)
else if (v/256%6 != 0)
out = out sprintf("%c%c", v/65536, v/256)
else
out = out sprintf("%c", v/65536)
}
CodePudding user response:
Problem is within base64_decode
function that outputs some junk characters on gnu-awk.
You can use this awk code that uses system provided base64
utility as an alternative:
{
delete attrs
for (i = 2; i <= NF; i ) {
match($i,/::? /)
key = substr($i,1,RSTART-1)
val = substr($i,RSTART RLENGTH)
if (RLENGTH == 3) {
cmd = "echo " val " | base64 -di"
cmd | getline val # should also check exit code here
}
attrs[key] = ((key in attrs) ? attrs[key] SUBSEP val : val)
}
if ( /\nuid:/ )
givenName[ attrs["uid"] ] = attrs["givenName"]
else
memberUid[ attrs["GroupCode"] ] = attrs["memberUid"]
}
END {
n = split(memberUid[GroupCode],uid,SUBSEP)
for ( i = 1; i <= n; i )
print givenName[ uid[i] ]
}
I have tested this on gnu and BSD awk versions and I am getting expected output in all the cases.
If you cannot use external base64
utility then I suggest you take a look here for awk version of base64 decode.