Home > Blockchain >  Extract data from invalid JSON using bash, sed, grep or awk?
Extract data from invalid JSON using bash, sed, grep or awk?

Time:10-08

I am trying to parse invalid JSON in bash

x="{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"

using the following script

for each in $(echo $x | sed 's/{componentId: /\n/g' ); do
    echo "Each: $each"
    echo [[ $each == 0Rb* ]]
    if [[ $each == 0Rb* ]]; then
        component=echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print $3}'
        reference=echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print $6}'
        echo "component: $component"
        echo "reference: $component"
    fi
done

but it doesn't work. I don't understand why it doesn't work. When I execute this line in console,

echo $x | sed 's/{componentId: /\n/g' 

I can see that this invalid json is split into lines correctly, but when I try to pass this into for-loop, each variable receives smaller chunks into it value

Each: 00N5E000005vm9e,

I am confused.

What I am trying to do is to extract the value between componentName: and , and another value between referenceName: and , for each item from the invalid json when componentId doesn't start with 00N. Is there a way to achieve this?

I have also tried to use jq -n $x but it fails with jq: error: syntax error, unexpected IDENT, expecting '}' (Unix shell quoting issues?) at <top-level>, line 1:

CodePudding user response:

Convert it back to valid json with sed, e.g.:

# Remove redundant space (assuming the text is in the `x` variable)
<<<"$x" sed 's/: /:/g'     |

# Quote all "words"
sed -E 's/[^"{}:,] /"&"/g' |

# Separate objects
sed 's/," "/\n/g'          |

# Parse json
jq .

Output:

{
  "componentId": "00N5E000005vm9e",
  " componentName": "Field",
  " referenceId": "0M05E0000002XbV",
  " referenceName": "RecordPageName1",
  " referenceUrl": "null",
  " message": "Component is in use by another component in your organization.",
  " reasonCode": "10"
}
{
  "componentId": "00N5E000005vm9e",
  " componentName": "Field",
  " referenceId": "0M05E0000002XbV",
  " referenceName": "RecordPageName1",
  " referenceUrl": "null",
  " message": "Component is in use by another component in your organization.",
  " reasonCode": "10"
}
{
  "componentId": "00N5E000005vm9e",
  " componentName": "Field",
  " referenceId": "0M05E0000002XbV",
  " referenceName": "RecordPageName1",
  " referenceUrl": "null",
  " message": "Component is in use by another component in your organization.",
  " reasonCode": "10"
}
{
  "componentId": "0Rb5E000000BGVi",
  " componentName": "Versions",
  " referenceId": "0M05E0000002XbV",
  " referenceName": "RecordPageName1",
  " referenceUrl": "null",
  " message": "Component is in use by another component in your organization.",
  " reasonCode": "10"
}
{
  "componentId": "0Rb5E000000BGVj",
  " componentName": "Approves",
  " referenceId": "0M05E0000002XbV",
  " referenceName": "RecordPageName1",
  " referenceUrl": "null",
  " message": "Component is in use by another component in your organization.",
  " reasonCode": "10"
}
{
  "componentId": "0Rb5E000000BGVe",
  " componentName": "activityThreads",
  " referenceId": "0M05E0000002XbV",
  " referenceName": "RecordPageName1",
  " referenceUrl": "null",
  " message": "Component is in use by another component in your organization.",
  " reasonCode": "10"
}
{
  "componentId": "0Rb5E000000BGVf",
  " componentName": "Attachments",
  " referenceId": "0M05E0000002XbV",
  " referenceName": "RecordPageName1",
  " referenceUrl": "null",
  " message": "Component is in use by another component in your organization.",
  " reasonCode": "10"
}
{
  "componentId": "0Rb5E000000BGVh",
  " componentName": "Details",
  " referenceId": "0M05E0000002XbV",
  " referenceName": "RecordPageName1",
  " referenceUrl": "null",
  " message": "Component is in use by another component in your organization.",
  " reasonCode": "10"
}

CodePudding user response:

Thanks for comments, looks like I have figured this out.

echo $x | sed 's/{componentId: /\n/g' | while IFS=\n read -r each; do
    #echo "Each: $each"
    #echo [[ $each == 0Rb* ]]
    if [[ $each == 0Rb* ]]; then
        component=$(echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print $3}')
        reference=$(echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print $6}')
        echo "component: $component"
        echo "reference: $reference"
    fi
done

CodePudding user response:

This input string is part of a YAML objects array container. So parse it with a YAML parser.

With Python:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import sys
import yaml
import json

# Your input invalid JSON but valid YAML elements part of an array
x = "{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"

# Compose yamlstring from x by adding the missing data array container
yamlstring = "data: ["   x   "]"

# Load data from the yamlstring
data = yaml.load(yamlstring, yaml.SafeLoader)

# Output data as JSON
json.dump(data, sys.stdout, indent=2)

Or from a shell using yq as parser:

#!/usr/bin/env sh

x="{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"

yamlstring="data: [$x]"

printf %s "$yamlstring" | yq -I 4 -o json e '.' -
  • Related