Home > OS >  Awk or Sed Command to Fix Bad JSON formatting?
Awk or Sed Command to Fix Bad JSON formatting?

Time:07-15

Okay, so I've got over a hundred JSON files with predictable bad formatting in several places per file.

Instead of using [ ] to indicate an array, they use { } instead.

For example:

"grid": {
"C1", "D1", "E1", "C2", "D2", "E2", "F2", "B3", "C3", "D3", "E3", "F3", "B4", "C4", "D4", "E4", "F4", "C5", "D5", "E5", "F5", "C6", "D6", "E6"
},

Each file has multiple arrays in it with this problem, each with a different key.

I came up with this to fix the above example, but it isn't very universal:

sed 's/^\t\t"grid": {/^\t\t"grid: [/; s/"E6" },$/"E6" ],/' myfile.json

I also tried writing a more complicated awk script, something along these lines:

awk -i '/grid/ { gsub("{",{["); gsub("}","]") print $0 }' myfile.json

But it replaced the contents of myfile.json to be only the row that contained the string "grid".

Is there a reliable one-liner to fix this issue?

CodePudding user response:

I propose following GNU AWK solution, let file.json content be

{"hello": 1,
"grid": {"C1", "D1", "E1", "C2", "D2", "E2", "F2", "B3", "C3", "D3", "E3", "F3", "B4", "C4", "D4", "E4", "F4", "C5", "D5", "E5", "F5", "C6", "D6", "E6"},
"something": "else"}

then

awk 'BEGIN{FPAT=".";OFS=""}/grid/&&match($0,/\{[^}]*\}/){$RSTART="[";$(RSTART RLENGTH-1)="]"}{print}' file.json

gives output

{"hello": 1,
"grid": ["C1", "D1", "E1", "C2", "D2", "E2", "F2", "B3", "C3", "D3", "E3", "F3", "B4", "C4", "D4", "E4", "F4", "C5", "D5", "E5", "F5", "C6", "D6", "E6"],
"something": "else"}

Explanation: firstly I inform GNU AWK that field is any single character (.) and output field separator (OFS) is empty string (without that there would be unwanted spaces in output) then for each line with grid in it and containing literal { followed by zero or more (*) non (^) } and literal }, I replace first ($RSTART) character of what was matched using [ and last ($(RSTART RLENGTH-1)) character of what was matched using ], for each line, altered or not, I print it. Note that I use match function rather than using just regular expression as I then use RSTART and RLENGTH which are set by this variable. Note that return value of match is used as part of condition so if there will be grid in line but not {...} then said line will remain unchanged.

(tested in gawk 4.2.1)

CodePudding user response:

How's this? (Update: probably scroll down to the Perl version near the end.)

sed -e 's/{\(\([0-9.]\ \|false\|true\|null\|"[^"]*"\) *[,}]\)/[\1/g' \
    -e 's/\([,[] *\([0-9.]\ \|false\|true\|null\|"[^"]*"\)\)}/\1]/g' file

In other words, if the thing after {"thing" or before "thing"} is a comma or a curly brace (and not a colon, like you would expect in a proper JSON dictionary), switch the curly to a square bracket. (In the second expression, we will already have replaced any opening curly with a square one, so look for that instead.)

The regex could be made less fugly if your sed supports -E or -r, but unfortunately, this non-standard option is not portable. (In brief, it lets you use the ERE regex dialect instead of BRE, where you mind-numbingly have to backslash grouping parentheses etc.)

Unfortunately, it requires the curly to be on the same line as the contents of the array. Also, like any regex solution, it's not easily able to distinguish between (what looks like JSON inside) a quoted string and actual JSON.

Demo: https://ideone.com/PoZguV

I suppose the same approach could be extended to examine lines which start or end with a lone curly brace, but I'd switch to Awk or Perl for that. In fact, Perl's "slurp mode" perl -0777 could probably handle the entire input file in one go with minor modifications to the regexes.

perl -0777 -pe '
    s/\{(\s*(?:[0-9.] |false|true|null|"[^"]*")\s*[,}])/[$1/g;
    s/([,[]\s*(?:[0-9.] |false|true|null|"[^"]*")\s*)\}/$1]/g' file.json

This removes any reliance on newlines for analyzing the file, since we read all of it into memory, and rely on \s to match any whitespace, including newlines. If you want to modify the file in-place, Perl also supports the -i option, like some versions of sed.

Demo: https://ideone.com/0C4gPt

CodePudding user response:

#!/bin/bash
FILE="test.json"
JSON="$(sed -E 's/([}{])/\n\1\n/g' $FILE)"
while :; do
    JQTEST=$(jq  '.' <<<"$JSON" 2>&1|grep "Objects must consist of key:value pairs at line")
    rc=$?
    if [ $rc -eq 0 ]; then 
        LINE=$(sed -E "s/.* line ([0-9] ), .*/\1/" <<<"$JQTEST")
        COL=$(sed -E "s/.* column ([0-9] )$/\1/" <<<"$JQTEST")
        [ "$COL" -ne 1 ] && LINE=$((LINE-1)) 
        JSON=$(sed -E "$LINE s/\{/[/; $LINE s/}/]/" <<<"$JSON")
    else
      jq  '.' <<<"$JSON" # > "new_${FILE}" or "${FILE}" 
      break
    fi
done

$ cat test.json 
{
"grid1": {"C1", "D1", "E1", "C2"}, 
"grid2": {"C1", "D1", "E1", "C2"}, 
"grid3": {"C1", "D1", "E1", "C2"} 
}


$ script.sh 
{
  "grid1": [
    "C1",
    "D1",
    "E1",
    "C2"
  ],
  "grid2": [
    "C1",
    "D1",
    "E1",
    "C2"
  ],
  "grid3": [
    "C1",
    "D1",
    "E1",
    "C2"
  ]
}
  • Related