Home > Software engineering >  Nested JSON Transformation to insert into SQL Table - Azure Data Factory
Nested JSON Transformation to insert into SQL Table - Azure Data Factory

Time:10-21

I have connected to a REST API and the data is structured in a nested JSON format that requires some transformation before I can insert into a SQL table. I am trying to use Azure Data Factory to facilitate the transformation, but I do not know how to trim off the unnecessary data. This is what it looks like:

{
"count": 216,
"results": [
    {
        "key": "users",
        "id": "18603444"
    },
    {
        "key": "users",
        "id": "18603297"
    },
    {
        "key": "users",
        "id": "18603298"
    },
    {
        "key": "users",
        "id": "18603445"
    },
    {
        "key": "users",
        "id": "16407315"
    },
    {
        "key": "users",
        "id": "18636176"
    },
    {
        "key": "users",
        "id": "18588630"
    },
    {
        "key": "users",
        "id": "18588941"
    },
    {
        "key": "users",
        "id": "18603301"
    },
    {
        "key": "users",
        "id": "18603302"
    },
    {
        "key": "users",
        "id": "18603303"
    },
    {
        "key": "users",
        "id": "18588634"
    },
    {
        "key": "users",
        "id": "18722305"
    },
    {
        "key": "users",
        "id": "18603446"
    },
    {
        "key": "users",
        "id": "18604710"
    },
    {
        "key": "users",
        "id": "18624916"
    },
    {
        "key": "users",
        "id": "18625925"
    },
    {
        "key": "users",
        "id": "18603447"
    },
    {
        "key": "users",
        "id": "18603448"
    },
    {
        "key": "users",
        "id": "18603305"
    }
],
"users": {
    "18603444": {
        "full_name": "",
        "photo_path": "",
        "email_address": "",
        "headline": "",
        "generic": false,
        "disabled": false,
        "update_whitelist": [
            "full_name",
            "headline",
            "email_address",
            "external_reference",
            "linkedin_url",
            "bio",
            "website",
            "company_name",
            "address1",
            "address2",
            "city",
            "state",
            "zip"
        ],
        "account_id": "",
        "id": "18603444"
    },
"meta": {
    "count": 216,
    "page_count": 11,
    "page_number": 1,
    "page_size": 20
}}

Each user is structured with their ID and their respective data. I want to remove the first part with the count and results and the last part with the meta and ONLY keep the data in each ID object. I think i can delete those two pieces in the mapping of the Copy into step, but I am not sure the best practice.

How do I do this with Azure Data Factory and SQL? Or should I use an Azure Function? I am not allowed to use python or any other scripting (if it is possible). Can someone please assist?

CodePudding user response:

Add a data flow activity to your pipeline. In the data flow, add a REST source and then a Select transformation. In the Select transformation, there is a mapping panel. In there, remove the JSON properties you wish to discard.

CodePudding user response:

Here is the sample procedure for the transformation of Nested JSON to SQL DB

Create a linked service to the REST API. enter image description here

Attched dataset as the source in Copy activity. enter image description here

Create Linked Service to SQL DB as sink. enter image description here enter image description here

Dataset of SQL database attached at the sink to copy activity. enter image description here

In the Mapping, Import the schema from the data remove the unnecessary fields by selecting and delete the columns. enter image description here

Result we get as per the expected in SQL DB with required data from the Nested JSON. enter image description here

  • Related