I've written a shell function that converts an UTF-8
encoded string to a JSON string, using awk
:
json_stringify() {
LANG=C awk '
BEGIN {
for ( i = 1; i < ARGC; i )
print json_stringify(ARGV[i])
}
function json_stringify( str, _str, _out ) {
if( ! ("\\" in _ESC_) )
for ( i = 1; i <= 127; i )
_ESC_[ sprintf( "%c", i) ] = sprintf( "\\ux", i )
_str = str
_out = "\""
while ( match( _str, /[\"\\[:cntrl:]]/ ) ) {
_out = _out substr(_str,1,RSTART-1) _ESC_[substr(_str,RSTART,RLENGTH)]
_str = substr( _str, RSTART RLENGTH )
}
return _out _str "\""
}
' "$@"
}
It feels like I missed something trivial, because when I run (in bash):
json_stringify 'A"B' 'C\D' $'\b \f \t \r \n'
I get:
"A\u0022B"
while my expected output is:
"A\u0022B"
"C\u005cD"
"\u0008 \u000c \u0009 \u000d \u000a"
What could be the problem(s) in my code?
CodePudding user response:
One issue I see is the dual use of i
as a loop variable in both the awk/BEGIN
block as well as the function, and because i
is not declared as 'local' in the function you end up with just one instance of i
in use for the entire script. Net result is the function is pushing i
out to 127
which is well beyond ARGC
so the BEGIN
block only loops once (i=1
) because on the 2nd loop i=127
.
Two possible fixes:
declare i
as local to the function, eg:
function json_stringify( str, _str, _out, i ) {
or use a different loop variable (eg, j
) in one of the loops, eg:
# in the BEGIN block:
for ( j = 1; j < ARGC; j )
print json_stringify(ARGV[j])
# or in the function:
for ( j = 1; j <= 127; j )
_ESC_[ sprintf( "%c", j) ] = sprintf( "\\ux", j )
Testing each of the possible fixes allows me to generate:
"A\u0022B"
"C\u005cD"
"\u0008 \u000c \u0009 \u000d \u000a"
Controlling Variable Scope - brief discussion on this topic.