I have JSON in files with text contents like:
{
"id": "01",
"Variables": [
{
"Name": "myvar",
"Value": "15"
}
]
},
{
"id": "01",
"Variables": [
{
"Name": "myvar",
"Value": "15"
}
]
}
Note this example has 2 records, each split over multiple lines with a comma between each.
This comma-delimiter between records makes it difficult to process e.g. with jq:
$ cat myfile.json | jq -s
parse error: Expected value before ',' at line 9...
Or with python:
import json
with open("alert_feedback_20220424.json", "r") as f:
j = json.load(f)
json.decoder.JSONDecodeError: Extra data: line 9...
Ultimately I actually want to read this data with spark:
spark.read.option(
'sep', ','
).option(
'header', False
).option(
'multiLine', True
).csv(
'file://my/project/data/myfile.json'
)
But this doesn't seem to have parsed the json correctly. I'll add details when I have time on request.
Links to solutions tried:
- parse error: Expected value before ',' at line 71, column 2
- Python: Change multi-line json String to single line
- Remove trailing json comma with command line tools
How can I programmatically remove this comma after each JSON record, or otherwise format this json to be parsed correctly?
CodePudding user response:
Using jq
, read in the whole file as raw text using the -R
option, receive it as one long string using the -s
option, wrap that string in brackets and use fromjson
to decode it from JSON. You should now have a valid array.
jq -Rs '"[\(.)]" | fromjson' myfile.json
[
{
"id": "01",
"Variables": [
{
"Name": "myvar",
"Value": "15"
}
]
},
{
"id": "01",
"Variables": [
{
"Name": "myvar",
"Value": "15"
}
]
}
]
Use .[]
to have individual items (without commas in between):
jq -Rs '"[\(.)]" | fromjson[]' myfile.json
{
"id": "01",
"Variables": [
{
"Name": "myvar",
"Value": "15"
}
]
}
{
"id": "01",
"Variables": [
{
"Name": "myvar",
"Value": "15"
}
]
}