I am trying to parse invalid JSON in bash
x="{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"
using the following script
for each in $(echo $x | sed 's/{componentId: /\n/g' ); do
echo "Each: $each"
echo [[ $each == 0Rb* ]]
if [[ $each == 0Rb* ]]; then
component=echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print $3}'
reference=echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print $6}'
echo "component: $component"
echo "reference: $component"
fi
done
but it doesn't work. I don't understand why it doesn't work. When I execute this line in console,
echo $x | sed 's/{componentId: /\n/g'
I can see that this invalid json is split into lines correctly, but when I try to pass this into for-loop, each variable receives smaller chunks into it value
Each: 00N5E000005vm9e,
I am confused.
What I am trying to do is to extract the value between componentName:
and ,
and another value between referenceName:
and ,
for each item from the invalid json when componentId
doesn't start with 00N
. Is there a way to achieve this?
I have also tried to use jq -n $x
but it fails with jq: error: syntax error, unexpected IDENT, expecting '}' (Unix shell quoting issues?) at <top-level>, line 1:
CodePudding user response:
Convert it back to valid json with sed
, e.g.:
# Remove redundant space (assuming the text is in the `x` variable)
<<<"$x" sed 's/: /:/g' |
# Quote all "words"
sed -E 's/[^"{}:,] /"&"/g' |
# Separate objects
sed 's/," "/\n/g' |
# Parse json
jq .
Output:
{
"componentId": "00N5E000005vm9e",
" componentName": "Field",
" referenceId": "0M05E0000002XbV",
" referenceName": "RecordPageName1",
" referenceUrl": "null",
" message": "Component is in use by another component in your organization.",
" reasonCode": "10"
}
{
"componentId": "00N5E000005vm9e",
" componentName": "Field",
" referenceId": "0M05E0000002XbV",
" referenceName": "RecordPageName1",
" referenceUrl": "null",
" message": "Component is in use by another component in your organization.",
" reasonCode": "10"
}
{
"componentId": "00N5E000005vm9e",
" componentName": "Field",
" referenceId": "0M05E0000002XbV",
" referenceName": "RecordPageName1",
" referenceUrl": "null",
" message": "Component is in use by another component in your organization.",
" reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVi",
" componentName": "Versions",
" referenceId": "0M05E0000002XbV",
" referenceName": "RecordPageName1",
" referenceUrl": "null",
" message": "Component is in use by another component in your organization.",
" reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVj",
" componentName": "Approves",
" referenceId": "0M05E0000002XbV",
" referenceName": "RecordPageName1",
" referenceUrl": "null",
" message": "Component is in use by another component in your organization.",
" reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVe",
" componentName": "activityThreads",
" referenceId": "0M05E0000002XbV",
" referenceName": "RecordPageName1",
" referenceUrl": "null",
" message": "Component is in use by another component in your organization.",
" reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVf",
" componentName": "Attachments",
" referenceId": "0M05E0000002XbV",
" referenceName": "RecordPageName1",
" referenceUrl": "null",
" message": "Component is in use by another component in your organization.",
" reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVh",
" componentName": "Details",
" referenceId": "0M05E0000002XbV",
" referenceName": "RecordPageName1",
" referenceUrl": "null",
" message": "Component is in use by another component in your organization.",
" reasonCode": "10"
}
CodePudding user response:
Thanks for comments, looks like I have figured this out.
echo $x | sed 's/{componentId: /\n/g' | while IFS=\n read -r each; do
#echo "Each: $each"
#echo [[ $each == 0Rb* ]]
if [[ $each == 0Rb* ]]; then
component=$(echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print $3}')
reference=$(echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print $6}')
echo "component: $component"
echo "reference: $reference"
fi
done
CodePudding user response:
This input string is part of a YAML objects array container. So parse it with a YAML parser.
With Python:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import sys
import yaml
import json
# Your input invalid JSON but valid YAML elements part of an array
x = "{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"
# Compose yamlstring from x by adding the missing data array container
yamlstring = "data: [" x "]"
# Load data from the yamlstring
data = yaml.load(yamlstring, yaml.SafeLoader)
# Output data as JSON
json.dump(data, sys.stdout, indent=2)
Or from a shell using yq
as parser:
#!/usr/bin/env sh
x="{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"
yamlstring="data: [$x]"
printf %s "$yamlstring" | yq -I 4 -o json e '.' -