Convert a single JSON to multiple jsons by taking one of the key values?-CodePudding

I currently have one json file as follows in terms of formatting:

{
    "Comment":"json data",
    "Changes":[
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record1",
                "Type":"CNAME",
                "SetIdentifier":"record1-ap-northeast",
                "GeoLocation":{
                    "CountryCode":"JP"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record1"
                    }
                ],
                "HealthCheckId":"ID"
            }
        },
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record2",
                "Type":"CNAME",
                "SetIdentifier":"record2-ap-south",
                "GeoLocation":{
                    "CountryCode":"SG"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record2"
                    }
                ],
                "HealthCheckId":"ID"
            }
        },
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record3",
                "Type":"CNAME",
                "SetIdentifier":"record3-ap-west",
                "GeoLocation":{
                    "CountryCode":"IN"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record3"
                    }
                ],
                "HealthCheckId":"ID"
            }
        },
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record4.",
                "Type":"CNAME",
                "SetIdentifier":"record4",
                "GeoLocation":{
                    "CountryCode":"*"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record4-ap-west"
                    }
                ],
                "HealthCheckId":"ID"
            }
        }
    ]
}

The original file has 20000 such values for the "Changes key". I want to create a file with 830 values in each file and create as many files as it creates. In order to achieve this I need it in the below format.

{
    "Comment":"json data",
    "Changes":[
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record4.",
                "Type":"CNAME",
                "SetIdentifier":"record4", #830 such arrays in each file
                "GeoLocation":{
                    "CountryCode":"*"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record4-ap-west"
                    }
                ],
                "HealthCheckId":"ID"
            }
        }
    ]
}

I've created the below shell script to do this

#!/bin/bash

# Set the input file name
input_file="input.json"

# Set the output file prefix
output_file_prefix="output"

# Set the number of objects per output file
objects_per_file=830

# Skip the first two lines of the input file
tail -n  3 "$input_file" > temp.json

# Get the total number of lines in the input file
total_lines=$(wc -l < temp.json)

# Calculate the number of output files needed
output_files=$(((total_lines   objects_per_file - 1) / objects_per_file))

# Split the input file into multiple output files
split -l $objects_per_file temp.json "$output_file_prefix"

# Loop through each output file and add the opening and closing square brackets
for file in "$output_file_prefix"*; do
  echo "[" > "$file".json
  cat "$file" >> "$file".json
  echo "]" >> "$file".json
  rm "$file"
done

# Remove the temporary file
rm temp.json

**By using this I am getting the output as expected but it is broken as it is considering 830 lines but not 830 arrays. ** Format:

#start of file
[
{
"Action": "DELETE",
"ResourceRecordSet":
{
  "Name": "record1",
  "Type": "CNAME",
  "SetIdentifier": "record1-ap-northeast",
  "GeoLocation": {
    "CountryCode": "JP"
  },
  "TTL": 60,
  "ResourceRecords": [
    {
      "Value": "record1"
    }
  ],
  "HealthCheckId": "ID"
}
},

#end of file
{
"Action": "DELETE",
"ResourceRecordSet":
{
"Action": "DELETE",
"ResourceRecordSet":
{
  "Name": "record4.",
  "Type": "CNAME",
  "SetIdentifier": "record4"
]

How do I achieve the required result. Due to the character limitation I cannot use more than 830 such arrays in each file? I tried using jq tool to achieve this but I am completely new with it. Could you please help me with this?

CodePudding user response：

If you wish to use jq, you will have to do it in two or three steps. Each step, however, is very easy.

The first step uses jq with the -c option to create a JSONLines file with the JSON objects you want:

< input.json jq -c '
  (.Changes | _nwise(830)) as $C   # 830 per problem statement
  | .Changes = $C
' > output.jsonl

Next, partition output.jsonl into the files you want. This can be done in many ways, e.g. using awk, or even the shell's read.

Finally, if you want the separate files to be "pretty-printed", you could use jq to do that in the obvious way.

CodePudding user response：

so even assuming some of the JSON "rows" are compacted into 1 line (e.g. equivalent of jq -c) while others are pretty-printed in a tree format, then all you need is the right regex in awk to identify its row delimiter/sep ("RS") :

gcat <( printf '%s' "$json_in_1$json_in_1$ajson_in_1" | jq -c      ) 
     <( printf '%s\n%s\n%s' "$json_in_1" "$json_in_1" "$json_in_1" ) |

 {m,g,n}awk '
 BEGIN { RS = (_ = "[[:space:]]*") (__ = "[}]") \
              (_)__ (_)"[]]" (_)__ (FS =  "\n") "?"
        ORS = (_ = "}")_     ("]")_ FS
        OFS = "\f\r\t"
       _ =_^= __ = (_<_)
 } { 
     printf(" NR # %d | NF = %d :: %s>>>>%s%s%.*s%s>>>>%s",
              NR, NF, FS, FS, $__, _<NF, FS, ORS, FS, FS) }'

 NR # 1 | NF = 1 :: 
>>>>
{"Comment":"json data","Changes":[{"Action":"DELETE","ResourceRecordSet":{"Name":"record4.","Type":"CNAME","SetIdentifier":"record4","GeoLocation":{"CountryCode":"*"},"TTL":60,"ResourceRecords":[{"Value":"record4-ap-west"}],"HealthCheckId":"ID"}}]}
>>>>
 NR # 2 | NF = 1 :: 
>>>>
{"Comment":"json data","Changes":[{"Action":"DELETE","ResourceRecordSet":{"Name":"record4.","Type":"CNAME","SetIdentifier":"record4","GeoLocation":{"CountryCode":"*"},"TTL":60,"ResourceRecords":[{"Value":"record4-ap-west"}],"HealthCheckId":"ID"}}]}
>>>>
 NR # 3 | NF = 19 :: 
>>>>
{
    "Comment":"json data",
    "Changes":[
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record4.",
                "Type":"CNAME",
                "SetIdentifier":"record4",
                "GeoLocation":{
                    "CountryCode":"*"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record4-ap-west"
                    }
                ],
                "HealthCheckId":"ID"
}}]}
>>>>
 NR # 4 | NF = 19 :: 
>>>>
{
    "Comment":"json data",
    "Changes":[
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record4.",
                "Type":"CNAME",
                "SetIdentifier":"record4",
                "GeoLocation":{
                    "CountryCode":"*"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record4-ap-west"
                    }
                ],
                "HealthCheckId":"ID"
}}]}
>>>>
 NR # 5 | NF = 19 :: 
>>>>
{
    "Comment":"json data",
    "Changes":[
        {
            "Action":"DELETE",
            "ResourceRecordSet":{
                "Name":"record4.",
                "Type":"CNAME",
                "SetIdentifier":"record4",
                "GeoLocation":{
                    "CountryCode":"*"
                },
                "TTL":60,
                "ResourceRecords":[
                    {
                        "Value":"record4-ap-west"
                    }
                ],
                "HealthCheckId":"ID"
}}]}
>>>>

then once you've been able to isolate individual "Change Key" records, then outputing every 830 rows should be relative straight-forward.

you can pipe the output of that further downstream to confirm the output is valid JSON via :

... | awk '/^[{]/,/[}][}][]][}]$/' | jq

as long as the input structure is very well defined, then awk can handle JSONs just fine instead of needing a dedicated parser.