Home > Enterprise >  JOLT: Merge specific data from JSON array using id key
JOLT: Merge specific data from JSON array using id key

Time:11-11

I'm getting data in an specific way from an API and I have to convert it to a cleaner version of it.

What I get from the API is a JSON like this (you can see that there is some information duplicated as for the first fields but the investor is different).

{
  "clubhouse": [
    {
      "id": "01",
      "statusId": "ok",
      "stateid": "2",
      "TypeId": "3",
      "investors": [
        {
          "investor": {
            "id": "1234",
            "gender": "01"
          },
          "inamount": "1500000",
          "ratio": "12"
        }
      ]
    },
    {
      "id": "01",
      "statusId": "ok",
      "stateid": "2",
      "TypeId": "3",
      "investors": [
        {
          "investor": {
            "id": "4321",
            "gender": "02"
          },
          "inamount": "1700000",
          "ratio": "12"
        }
      ]
    },
    {
      "id": "02",
      "statusId": "ok",
      "stateid": "2",
      "TypeId": "3",
      "investors": [
        {
          "investor": {
            "id": "1333",
            "gender": "01"
          },
          "inamount": "1500000",
          "ratio": "12"
        }
      ]
    },
    {
      "id": "03",
      "statusId": "ok",
      "stateid": "5",
      "TypeId": "3",
      "investors": [
        {
          "investor": {
            "id": "", 
            "gender": ""
          },
          "inamount": "",
          "ratio": ""
        }
      ]
    },
    {
      "id": "02",
      "statusId": "ok",
      "stateid": "2",
      "TypeId": "3",
      "investors": [
        {
          "investor": {
            "id": "1334",
            "gender": "02"
          },
          "inamount": "1900000",
          "ratio": "12"
        }
      ]
    }
  ]
}

I need to merge the investors and eliminate the duplicated information, the the expected result will be

{
  "clubhouse": [
    {
      "id": "01",
      "statusId": "ok",
      "stateid": "2",
      "TypeId": "3",
      "investors": [
        {
          "investor": {
            "id": "1234",
            "gender": "01"
          },
          "inamount": "1500000",
          "ratio": "12"
        },
        {
          "investor": {
            "id": "4321",
            "gender": "02"
          },
          "inamount": "1700000",
          "ratio": "12"
        }
      ]
    },
    {
      "id": "02",
      "statusId": "ok",
      "stateid": "2",
      "TypeId": "3",
      "investors": [
        {
          "investor": {
            "id": "1333",
            "gender": "01"
          },
          "inamount": "1500000",
          "ratio": "12"
        },
        {
          "investor": {
            "id": "1334",
            "gender": "02"
          },
          "inamount": "1900000",
          "ratio": "12"
        }
      ]
    },
    {
      "id": "03",
      "statusId": "ok",
      "stateid": "5",
      "TypeId": "3",
      "investors": [
        {
          "investor": {
            "id": "1555",
            "gender": "01"
          },
          "inamount": "2000000",
          "ratio": "15"
        }
      ]
    }
  ]
}

I'd try a couple of JOLTS and I got to merge the fields but not eliminate the duplicates.

CodePudding user response:

You can start with grouping by id values such as

[
  {
   // group by "id" values to create separate objects 
    "operation": "shift",
    "spec": {
      "*": {
        "*": {
          "*": "@(1,id).&",
          "investors": {
            "*": {
              "*": {
                "@": "@(4,id).&3[&4].&" // &3 -> going 3 levels up to grab literal "investors", [&4] -> going 4 levels up the tree in order to reach the indexes of "clubhouse" array, & -> replicate the leaf node values for the current key-value pair
              }
            }
          }
        }
      }
    }
  },
  {
    // get rid of "null" values
    "operation": "modify-overwrite-beta",
    "spec": {
      "*": "=recursivelySquashNulls"
    }
  },
  {
    // pick only the first components from the repeated values populated within the arrays 
    "operation": "cardinality",
    "spec": {
      "*": {
        "*": "ONE",
        "investors": "MANY"
      }
    }
  },
  {
    // get rid of object labels
    "operation": "shift",
    "spec": {
      "*": ""
    }
  }
]
  • Related