I currently have one json file as follows in terms of formatting:
{
"Comment":"json data",
"Changes":[
{
"Action":"DELETE",
"ResourceRecordSet":{
"Name":"record1",
"Type":"CNAME",
"SetIdentifier":"record1-ap-northeast",
"GeoLocation":{
"CountryCode":"JP"
},
"TTL":60,
"ResourceRecords":[
{
"Value":"record1"
}
],
"HealthCheckId":"ID"
}
},
{
"Action":"DELETE",
"ResourceRecordSet":{
"Name":"record2",
"Type":"CNAME",
"SetIdentifier":"record2-ap-south",
"GeoLocation":{
"CountryCode":"SG"
},
"TTL":60,
"ResourceRecords":[
{
"Value":"record2"
}
],
"HealthCheckId":"ID"
}
},
{
"Action":"DELETE",
"ResourceRecordSet":{
"Name":"record3",
"Type":"CNAME",
"SetIdentifier":"record3-ap-west",
"GeoLocation":{
"CountryCode":"IN"
},
"TTL":60,
"ResourceRecords":[
{
"Value":"record3"
}
],
"HealthCheckId":"ID"
}
},
{
"Action":"DELETE",
"ResourceRecordSet":{
"Name":"record4.",
"Type":"CNAME",
"SetIdentifier":"record4",
"GeoLocation":{
"CountryCode":"*"
},
"TTL":60,
"ResourceRecords":[
{
"Value":"record4-ap-west"
}
],
"HealthCheckId":"ID"
}
}
]
}
The original file has 20000 such values for the "Changes key". I want to create a file with 830 values in each file and create as many files as it creates. In order to achieve this I need it in the below format.
{
"Comment":"json data",
"Changes":[
{
"Action":"DELETE",
"ResourceRecordSet":{
"Name":"record4.",
"Type":"CNAME",
"SetIdentifier":"record4", #830 such arrays in each file
"GeoLocation":{
"CountryCode":"*"
},
"TTL":60,
"ResourceRecords":[
{
"Value":"record4-ap-west"
}
],
"HealthCheckId":"ID"
}
}
]
}
I've created the below shell script to do this
#!/bin/bash
# Set the input file name
input_file="input.json"
# Set the output file prefix
output_file_prefix="output"
# Set the number of objects per output file
objects_per_file=830
# Skip the first two lines of the input file
tail -n 3 "$input_file" > temp.json
# Get the total number of lines in the input file
total_lines=$(wc -l < temp.json)
# Calculate the number of output files needed
output_files=$(((total_lines objects_per_file - 1) / objects_per_file))
# Split the input file into multiple output files
split -l $objects_per_file temp.json "$output_file_prefix"
# Loop through each output file and add the opening and closing square brackets
for file in "$output_file_prefix"*; do
echo "[" > "$file".json
cat "$file" >> "$file".json
echo "]" >> "$file".json
rm "$file"
done
# Remove the temporary file
rm temp.json
**By using this I am getting the output as expected but it is broken as it is considering 830 lines but not 830 arrays. ** Format:
#start of file
[
{
"Action": "DELETE",
"ResourceRecordSet":
{
"Name": "record1",
"Type": "CNAME",
"SetIdentifier": "record1-ap-northeast",
"GeoLocation": {
"CountryCode": "JP"
},
"TTL": 60,
"ResourceRecords": [
{
"Value": "record1"
}
],
"HealthCheckId": "ID"
}
},
#end of file
{
"Action": "DELETE",
"ResourceRecordSet":
{
"Action": "DELETE",
"ResourceRecordSet":
{
"Name": "record4.",
"Type": "CNAME",
"SetIdentifier": "record4"
]
How do I achieve the required result. Due to the character limitation I cannot use more than 830 such arrays in each file? I tried using jq tool to achieve this but I am completely new with it. Could you please help me with this?
CodePudding user response:
If you wish to use jq, you will have to do it in two or three steps. Each step, however, is very easy.
The first step uses jq with the -c option to create a JSONLines file with the JSON objects you want:
< input.json jq -c '
(.Changes | _nwise(830)) as $C # 830 per problem statement
| .Changes = $C
' > output.jsonl
Next, partition output.jsonl into the files you want. This can be done in many ways, e.g. using awk, or even the shell's read
.
Finally, if you want the separate files to be "pretty-printed", you could use jq to do that in the obvious way.
CodePudding user response:
so even assuming some of the JSON
"rows" are compacted into 1 line (e.g. equivalent of jq -c
) while others are pretty-printed in a tree format, then all you need is the right regex
in awk
to identify its row delimiter/sep ("RS"
) :
gcat <( printf '%s' "$json_in_1$json_in_1$ajson_in_1" | jq -c )
<( printf '%s\n%s\n%s' "$json_in_1" "$json_in_1" "$json_in_1" ) |
{m,g,n}awk ' BEGIN { RS = (_ = "[[:space:]]*") (__ = "[}]") \ (_)__ (_)"[]]" (_)__ (FS = "\n") "?" ORS = (_ = "}")_ ("]")_ FS OFS = "\f\r\t" _ =_^= __ = (_<_) } { printf(" NR # %d | NF = %d :: %s>>>>%s%s%.*s%s>>>>%s", NR, NF, FS, FS, $__, _<NF, FS, ORS, FS, FS) }'
NR # 1 | NF = 1 ::
>>>>
{"Comment":"json data","Changes":[{"Action":"DELETE","ResourceRecordSet":{"Name":"record4.","Type":"CNAME","SetIdentifier":"record4","GeoLocation":{"CountryCode":"*"},"TTL":60,"ResourceRecords":[{"Value":"record4-ap-west"}],"HealthCheckId":"ID"}}]}
>>>>
NR # 2 | NF = 1 ::
>>>>
{"Comment":"json data","Changes":[{"Action":"DELETE","ResourceRecordSet":{"Name":"record4.","Type":"CNAME","SetIdentifier":"record4","GeoLocation":{"CountryCode":"*"},"TTL":60,"ResourceRecords":[{"Value":"record4-ap-west"}],"HealthCheckId":"ID"}}]}
>>>>
NR # 3 | NF = 19 ::
>>>>
{
"Comment":"json data",
"Changes":[
{
"Action":"DELETE",
"ResourceRecordSet":{
"Name":"record4.",
"Type":"CNAME",
"SetIdentifier":"record4",
"GeoLocation":{
"CountryCode":"*"
},
"TTL":60,
"ResourceRecords":[
{
"Value":"record4-ap-west"
}
],
"HealthCheckId":"ID"
}}]}
>>>>
NR # 4 | NF = 19 ::
>>>>
{
"Comment":"json data",
"Changes":[
{
"Action":"DELETE",
"ResourceRecordSet":{
"Name":"record4.",
"Type":"CNAME",
"SetIdentifier":"record4",
"GeoLocation":{
"CountryCode":"*"
},
"TTL":60,
"ResourceRecords":[
{
"Value":"record4-ap-west"
}
],
"HealthCheckId":"ID"
}}]}
>>>>
NR # 5 | NF = 19 ::
>>>>
{
"Comment":"json data",
"Changes":[
{
"Action":"DELETE",
"ResourceRecordSet":{
"Name":"record4.",
"Type":"CNAME",
"SetIdentifier":"record4",
"GeoLocation":{
"CountryCode":"*"
},
"TTL":60,
"ResourceRecords":[
{
"Value":"record4-ap-west"
}
],
"HealthCheckId":"ID"
}}]}
>>>>
then once you've been able to isolate individual "Change Key"
records, then outputing every 830 rows should be relative straight-forward.
you can pipe the output of that further downstream to confirm the output is valid JSON
via :
... | awk '/^[{]/,/[}][}][]][}]$/' | jq
as long as the input structure is very well defined, then awk
can handle JSON
s just fine instead of needing a dedicated parser.