Home > Software design >  linux / bash parse through json-like data
linux / bash parse through json-like data

Time:04-09

Here is some data that I have:

animal { 
    dog {
        body {
            parts {
                legs = old
                brain = average
                tail= curly
                }
   
            }
        }
    cat {
        body {
            parts {
                legs = new
                brain = average
                tail {
                    base=hairy
                    tip=nothairy
                }
   
            }
        }
    }
}

Notice the data is not really json as it has the following rules:

  • supports = or = between key and value pairs.
  • No " or , throughout the data. separation of data is based on new line.

Is it even possible to parse this with awk or sed? I tried jq but it does not work as this isn't really true json data.

My goal is to display only "dog" and "cat". Based on them being the top values under "animal".

$ some-magical-command
dog
cat

CodePudding user response:

It's fairly close to syntax, if you feel like learning a new language.

set data {
    animal { 
        dog {
            body {
                parts {
                    legs = old
                    brain = large
                    tail= curly
                    }
       
                }
            }
        cat {
            body {
                parts {
                    legs = new
                    brain = tiny
                    tail {
                        base=hairy
                        tip=nothairy
                    }
       
                }
            }
        }
    }
}

set data [regsub -line -all {\s*=\s*(. )} $data { "\1"}]

dict get $data animal dog body parts brain    ;# => large

I know some people who would argue about your classification of dog brains vs cat brains...

CodePudding user response:

If you only need the second-level keys, and you're not too concerned about producing good error messages for erroneous inputs, then it's pretty straight-forward. The basic idea is this:

  1. There are three formats for an input line:

    • ID {
    • ID = value # where the = might not be space-separated
    • }
  2. As the lines are read, we keep track of nesting depth by incrementing a counter with the first line type and decrementing it with the third line type.

  3. When the nesting counter is 1, if the line has an ID field, we print it.

That can be done quite simply with an awk script. This script should be saved in a file with a name like level2_keys.awk; you can then execute the command awk -f level2_keys.awk /path/to/input/file. Note that all the rules end with next; to avoid rules following a match being evaluated.

$1 == "}"    { # Decrement nesting on close
               --nesting;
               next;
             }
/=/          { # Remove the if block if you don't want to print these keys.
               if (nesting == 1) {
                 gsub("=", " = ");    # Force = to be a field
                 print($1);
               }
               next;
             }
$2 == "{"    { # Increment nesting (and maybe print) on open
               if (nesting == 1) print($1);
                 nesting;
               next;
             }
# NF is non-zero if the line is not blank.
NF           { print "Bad input at " NR ": '"$0"'" > "/dev/stderr"; }
  • Related