I have a large file called
Metadata_01.json
It consistst of blocks that following this structure:
[
{
"Participant_id": "P04_00001",
"no_of_people": "Multiple",
"apparent_gender": "F",
"geographic_location": "AUS",
"ethnicity": "Caucasian",
"capture_device_used": "iOS 14",
"camera_orientation": "Portrait",
"camera_position": "Side View",
"indoor_outdoor_env": "Indoors",
"lighting_condition": "Bright",
"Occluded": 1,
"category": "Two Person",
"camera_movement": "Still",
"action": "No action",
"indoor_outdoor_in_moving_car_or_train": "Indoor",
"daytime_nighttime": "Nighttime"
},
{
"Participant_id": "P04_00002",
"no_of_people": "Single",
"apparent_gender": "M",
"geographic_location": "AUS",
"ethnicity": "Caucasian",
"capture_device_used": "iOS 14",
"camera_orientation": "Portrait",
"camera_position": "Frontal View",
"indoor_outdoor_env": "Outdoors",
"lighting_condition": "Bright",
"Occluded": "None",
"category": "Animals",
"camera_movement": "Still",
"action": "Small action",
"indoor_outdoor_in_moving_car_or_train": "Outdoor",
"daytime_nighttime": "Daytime"
},
And so on... thousands of them.
I am using the following command:
jq -cr '.[]' Metadata_01.json | awk '{print > (NR ".json")}'
And it's kinda doing the expected work.
From large file that is structured like this
I am getting tons of files that named like this
And structure like this (in one line)
Instead of those results I need each json file to be named after the "Participant_id" (e.g. P04_00002.json) And I want to preserve the json structure to look like that for each file
{
"Participant_id": "P04_00002",
"no_of_people": "Single",
"apparent_gender": "M",
"geographic_location": "AUS",
"ethnicity": "Caucasian",
"capture_device_used": "iOS 14",
"camera_orientation": "Portrait",
"camera_position": "Frontal View",
"indoor_outdoor_env": "Outdoors",
"lighting_condition": "Bright",
"Occluded": "None",
"category": "Animals",
"camera_movement": "Still",
"action": "Small action",
"indoor_outdoor_in_moving_car_or_train": "Outdoor",
"daytime_nighttime": "Daytime"
}
What adjustments should I make to the command above to achieve this? Or maybe there's an easier way to do this? Thank you!
CodePudding user response:
Would recommend using PowerShell since working with objects tends to be easier overall. Fortunately, PowerShell has a ConvertFrom-Json
cmdlet you can use to convert the returned text into a PS object letting you reference the properties via dot notation (.Participant_id
). Then, you'd just have to convert each iteration back to JSON format and export it. Here I use New-Item
to create the file with the output but piping to Out-File
would work as well.
$json = Get-Content -Path '.\Metadata_01.json' -Raw | ConvertFrom-Json
foreach ($json_object in $json)
{
New-Item -Path ".\Desktop\" -Name "$($json_object.Participant_id).json" -Value (ConvertTo-Json -InputObject $json_object) -ItemType 'File' -Force
}
The issue I can see you probably running into is not enough memory, due to the size of that file since you'll be saving to a variable first in this example. There are ways around it but this is for demonstration purposes.
CodePudding user response:
What adjustments should I make ...?
I'd go with:
jq -cr '.[] | (.Participant_id, .)' Metadata_01.json | awk '
NR%2==1 {id=$1;next} {print > "id." id ".json";}
'
One potential disadvantage of the above is that the output files will not be pretty-printed, but that can be dealt with in a number of ways, e.g. by getting awk to call jq.
"Big Data"
Of course if the input file is too large or too slow for jq empty
, then you will want to consider alternatives, e.g. jq's --stream
option, jstream
, or my own jm. For example if you want the JSON to be pretty-printed in each file:
while read -r json
do
fn=$(jq -r .Participant_id <<< "$json")
<<< "$json" jq . > "id.$fn.json"
done < <(jm Metadata_01.json)