Process multi-line json with trailing commas-CodePudding

I have JSON in files with text contents like:

{
    "id": "01",
    "Variables": [
        {
            "Name": "myvar",
            "Value": "15"
        }
    ]
},
{
    "id": "01",
    "Variables": [
        {
            "Name": "myvar",
            "Value": "15"
        }
    ]
}

Note this example has 2 records, each split over multiple lines with a comma between each.

This comma-delimiter between records makes it difficult to process e.g. with jq:

$ cat myfile.json | jq -s

parse error: Expected value before ',' at line 9...

Or with python:

import json
with open("alert_feedback_20220424.json", "r") as f:
    j = json.load(f)

json.decoder.JSONDecodeError: Extra data: line 9...

Ultimately I actually want to read this data with spark:

spark.read.option(
    'sep', ','
).option(
    'header', False
).option(
    'multiLine', True
).csv(
    'file://my/project/data/myfile.json'
)

But this doesn't seem to have parsed the json correctly. I'll add details when I have time on request.

Links to solutions tried:

How can I programmatically remove this comma after each JSON record, or otherwise format this json to be parsed correctly?

CodePudding user response：

Using jq, read in the whole file as raw text using the -R option, receive it as one long string using the -s option, wrap that string in brackets and use fromjson to decode it from JSON. You should now have a valid array.

jq -Rs '"[\(.)]" | fromjson' myfile.json

[
  {
    "id": "01",
    "Variables": [
      {
        "Name": "myvar",
        "Value": "15"
      }
    ]
  },
  {
    "id": "01",
    "Variables": [
      {
        "Name": "myvar",
        "Value": "15"
      }
    ]
  }
]

Demo

Use .[] to have individual items (without commas in between):

jq -Rs '"[\(.)]" | fromjson[]' myfile.json

{
  "id": "01",
  "Variables": [
    {
      "Name": "myvar",
      "Value": "15"
    }
  ]
}
{
  "id": "01",
  "Variables": [
    {
      "Name": "myvar",
      "Value": "15"
    }
  ]
}

Demo