Home > front end >  Process multi-line json with trailing commas
Process multi-line json with trailing commas

Time:04-26

I have JSON in files with text contents like:

{
    "id": "01",
    "Variables": [
        {
            "Name": "myvar",
            "Value": "15"
        }
    ]
},
{
    "id": "01",
    "Variables": [
        {
            "Name": "myvar",
            "Value": "15"
        }
    ]
}

Note this example has 2 records, each split over multiple lines with a comma between each.

This comma-delimiter between records makes it difficult to process e.g. with jq:

$ cat myfile.json | jq -s

parse error: Expected value before ',' at line 9...

Or with python:

import json
with open("alert_feedback_20220424.json", "r") as f:
    j = json.load(f)

json.decoder.JSONDecodeError: Extra data: line 9...

Ultimately I actually want to read this data with spark:

spark.read.option(
    'sep', ','
).option(
    'header', False
).option(
    'multiLine', True
).csv(
    'file://my/project/data/myfile.json'
)

But this doesn't seem to have parsed the json correctly. I'll add details when I have time on request.

Links to solutions tried:

How can I programmatically remove this comma after each JSON record, or otherwise format this json to be parsed correctly?

CodePudding user response:

Using jq, read in the whole file as raw text using the -R option, receive it as one long string using the -s option, wrap that string in brackets and use fromjson to decode it from JSON. You should now have a valid array.

jq -Rs '"[\(.)]" | fromjson' myfile.json
[
  {
    "id": "01",
    "Variables": [
      {
        "Name": "myvar",
        "Value": "15"
      }
    ]
  },
  {
    "id": "01",
    "Variables": [
      {
        "Name": "myvar",
        "Value": "15"
      }
    ]
  }
]

Demo

Use .[] to have individual items (without commas in between):

jq -Rs '"[\(.)]" | fromjson[]' myfile.json
{
  "id": "01",
  "Variables": [
    {
      "Name": "myvar",
      "Value": "15"
    }
  ]
}
{
  "id": "01",
  "Variables": [
    {
      "Name": "myvar",
      "Value": "15"
    }
  ]
}

Demo

  • Related