Home > database >  Filter out all occurrences of given properties from json object
Filter out all occurrences of given properties from json object

Time:05-07

I'm using PowerShell to extract data from an API call, update it and then pass it back to the API.

What I would like to know is whether or not there is a simple way to modify the JSON object, to filter out all the properties which are not desired at any location within the JSON structure?

I've tried the following, however the resultant JSON only has the lowest level properties removed (ie. "p2")

$example = ConvertFrom-Json '{"a":{"p1": "value1"},"p2": "value2", "b":"valueb"}'
$exclude = "p1", "p2"
$clean = $example | Select-Object -Property * -ExcludeProperty $exclude
ConvertTo-Json $clean -Compress

Result => {"a":{"p1":"value1"},"b":"valueb"}

I would like to have all $exlude entries removed, regardless of where they are located within the JSON. Is there a simple solution?

Update

Here is another (more complicated) JSON example:

{
  "a": {
    "p1": "value 1",
    "c": "value c",
    "d": {
      "e": "value e",
      "p2": "value 3"
    },
    "f": [
      {
      "g": "value ga",
      "p1": "value 4a"
      },
      {
      "g": "value gb",
      "p1": "value 4b"
      }
    ]
  },
  "p2": "value 2",
  "b": "value b"
}

The expected result (all p1 and p2 keys removed):

{
  "a": {
    "c": "value c",
    "d": {
      "e": "value e"
    },
    "f": [
      {
        "g": "value ga"
      },
      {
        "g": "value gb"
      }
    ]
  },
  "b": "value b"
}

CodePudding user response:

Unfortunately there doesn't appear to be an easy way. It actually proved quite challenging to correctly handle arrays. My approach is to recursively unroll the input (JSON) object, including any arrays, so we can easily apply filtering, then build a new object from the filtered properties.

Steps one and three are wrapped in the following reusable helper functions, one for unroll (ConvertTo-FlatObjectValues) and one for rebuilding the object (ConvertFrom-FlatObjectValues). There is a third function (ConvertFrom-TreeHashTablesToArrays), but it is only used internally by ConvertFrom-FlatObjectValues.

Function ConvertTo-FlatObjectValues {
    <#
    .SYNOPSIS
        Unrolls a nested PSObject/PSCustomObject "property bag".
    .DESCRIPTION
        Unrolls a nested PSObject/PSCustomObject "property bag" such as created by ConvertFrom-Json into flat objects consisting of path, name and value.
        Fully supports arrays at the root as well as for properties and nested arrays.
    #>
    [CmdletBinding()]
    param (
        [Parameter(Mandatory, ValueFromPipeline)] $InputObject,
        [string] $Separator = '.',
        [switch] $KeepEmptyObjects,
        [switch] $KeepEmptyArrays,
        [string] $Path,    # Internal parameter for recursion.
        [string] $Name     # Internal parameter for recursion.
    )
    
    process {
        if( $InputObject -is [System.Collections.IList] ) {

            if( $KeepEmptyArrays ) {
                # Output a special item to keep empty array.
                [PSCustomObject]@{ 
                    Path  = ($Path, "#").Where{ $_ } -join $Separator
                    Name  = $Name
                    Value = $null
                }
            }

            $i = 0
            $InputObject.ForEach{
                # Recursively unroll array elements.
                $childPath = ($Path, "#$i").Where{ $_ } -join $Separator
                ConvertTo-FlatObjectValues -InputObject $_ -Path $childPath -Name $Name `
                                           -Separator $Separator -KeepEmptyObjects:$KeepEmptyObjects -KeepEmptyArrays:$KeepEmptyArrays
                $i  
            }
        }
        elseif( $InputObject -is [PSObject] ) {

            if( $KeepEmptyObjects ) {
                # Output a special item to keep empty object.
                [PSCustomObject]@{ 
                    Path  = $Path
                    Name  = $Name
                    Value = [ordered] @{}
                }
            }

            $InputObject.PSObject.Properties.ForEach{
                # Recursively unroll object properties.
                $childPath = ($Path, $_.Name).Where{ $_ } -join $Separator
                ConvertTo-FlatObjectValues -InputObject $_.Value -Path $childPath -Name $_.Name `
                                           -Separator $Separator -KeepEmptyObjects:$KeepEmptyObjects -KeepEmptyArrays:$KeepEmptyArrays
            }
        }
        else {
            # Output scalar

            [PSCustomObject]@{ 
                Path  = $Path
                Name  = $Name
                Value = $InputObject 
            }
        }
    }
}

function ConvertFrom-FlatObjectValues {
    <#
    .SYNOPSIS
        Convert a flat list consisting of path and value into tree(s) of PSCustomObject.
    .DESCRIPTION
        Convert a flat list consisting of path and value, such as generated by ConvertTo-FlatObjectValues, into tree(s) of PSCustomObject.
        The output can either be an array (not unrolled) or a PSCustomObject, depending on the structure of the input data.
    #>
    [CmdletBinding()]
    param (
        [Parameter(Mandatory, ValueFromPipelineByPropertyName)] [string] $Path,
        [Parameter(Mandatory, ValueFromPipelineByPropertyName)] [AllowNull()] $Value,
        [Parameter()] [string] $Separator = '.'
    )

    begin {
        $tree = [ordered]@{}
    }

    process {
        # At first store everything (including array elements) into hashtables. 

        $branch = $Tree

        do {
            # Split path into root key and path remainder.
            $key, $path = $path.Split( $Separator, 2 )

            if( $path ) {
                # We have multiple path components, so we may have to create nested hash table.
                if( -not $branch.Contains( $key ) ) {
                    $branch[ $key ] = [ordered] @{}
                }           
                # Enter sub tree. 
                $branch = $branch[ $key ]
            }
            else {
                # We have arrived at the leaf -> set its value
                $branch[ $key ] = $value
            }
        }
        while( $path )
    }

    end {
        # So far we have stored the original arrays as hashtables with keys like '#0', '#1', ... (possibly non-consecutive).
        # Now convert these hashtables back into actual arrays and generate PSCustomObject's from the remaining hashtables.
        ConvertFrom-TreeHashTablesToArrays $tree
    }
}

Function ConvertFrom-TreeHashTablesToArrays {
    <#
    .SYNOPSIS
        Internal function called by ConvertFrom-FlatObjectValues.
    .DESCRIPTION
        - Converts arrays stored as hashtables into actual arrays.
        - Converts any remaining hashtables into PSCustomObject's. 
    #>
    [CmdletBinding()]
    param (
        [Parameter(Mandatory, ValueFromPipeline)] [Collections.IDictionary] $InputObject
    )

    process {    
        # Check if $InputObject has been generated from an array.
        $isArray = foreach( $key in $InputObject.Keys ) { $key.StartsWith('#'); break }

        if( $isArray ) {
            # Sort array indices as they might be unordered. A single '#' as key will be skipped, because it denotes an empty array.
            $sortedByKeyNumeric = $InputObject.GetEnumerator().Where{ $_.Key -ne '#' } | 
                                   Sort-Object { [int]::Parse( $_.Key.SubString( 1 ) ) }

            $outArray = $sortedByKeyNumeric.ForEach{
                
                if( $_.Value -is [Collections.IDictionary] ) {
                    # Recursion. Output array element will either be an object or a nested array.
                    ConvertFrom-TreeHashTablesToArrays $_.Value
                }
                else {
                    # Output array element is a scalar value.
                    $_.Value
                }
            }

            , $outArray  # Comma-operator prevents unrolling of the array, to support nested arrays.
        }
        else {
            # $InputObject has been generated from an object. Copy it to $outProps recursively and output as PSCustomObject.

            $outProps = [ordered] @{}

            $InputObject.GetEnumerator().ForEach{

                $outProps[ $_.Key ] = if( $_.Value -is [Collections.IDictionary] ) {
                    # Recursion. Output property will either be an object or an array.
                    ConvertFrom-TreeHashTablesToArrays $_.Value
                }
                else {
                    # Output property is a scalar value.
                    $_.Value
                }
            }

            [PSCustomObject] $outProps
        }
    }
}

Usage example:

$example = ConvertFrom-Json @'
{
  "a": {
    "p1": "value 1",
    "c": "value c",
    "d": {
      "e": "value e",
      "p2": "value 3"
    },
    "f": [
      {
      "g": "value ga",
      "p1": "value 4a"
      },
      {
      "g": "value gb",
      "p1": "value 4b"
      }
    ]
  },
  "p2": "value 2",
  "b": "value b"
}
'@

$exclude = "p1", "p2"

$clean = ConvertTo-FlatObjectValues $example |  # Step 1: unroll properties 
         Where-Object Name -notin $exclude |    # Step 2: filter
         ConvertFrom-FlatObjectValues           # Step 3: rebuild object

$clean | ConvertTo-Json -Depth 9

Output:

{
  "a": {
    "c": "value c",
    "d": {
      "e": "value e"
    },
    "f": [
      {
        "g": "value ga"
      },
      {
        "g": "value gb"
      }
    ]
  },
  "b": "value b"
}

Usage Notes:

  • Child objects are removed if they don't contain any properties after filtering. Empty arrays are removed as well. You can prevent this by passing -KeepEmptyObjects and/or -KeepEmptyArrays to function ConvertTo-FlatObjectValues.
  • If the input JSON is an array at the root level, make sure to pass it as an argument to ConvertTo-FlatObjectValues, instead of piping it (which would unroll it and the function would no longer know it's an array).
  • Filtering can also be done on the whole path of a property. E. g. to remove the P1 property only within the a object, you could write Where-Object Path -ne a.p1. To see how paths look like, just call ConvertTo-FlatObjectValues $example which outputs the flat list of properties and array elements:
    Path      Name Value
    ----      ---- -----
    a.p1      p1   value 1
    a.c       c    value c
    a.d.e     e    value e
    a.d.p2    p2   value 3
    a.f.#0.g  g    value ga
    a.f.#0.p1 p1   value 4a
    a.f.#1.g  g    value gb
    a.f.#1.p1 p1   value 4b
    p2        p2   value 2
    b         b    value b
    

Implementation Notes:

  • During unrolling ConvertTo-FlatObjectValues creates separate path segments (keys) for array elements which look like "#n" where n is the array index. This allows us to treat arrays and objects more uniformly, when rebuilding the object in ConvertFrom-FlatObjectValues.

  • ConvertFrom-FlatObjectValues first creates nested hashtables for all objects and arrays in its process section. This makes it easy to recollect properties into their respective objects. In this part of the code there is still no special treatment of arrays. The intermediate result now looks like this:

    {
      "a": {
        "c": "value c",
        "d": {
          "e": "value e"
        },
        "f": {
          "#0": {
            "g": "value ga"
          },
          "#1": {
            "g": "value gb"
          }
        }
      },
      "b": "value b"
    }
    
  • Only in the end section of ConvertFrom-FlatObjectValues, the arrays are rebuilt from the hashtables, which is done by function ConvertFrom-TreeHashTablesToArrays. It turns hashtables that have keys starting with "#" back into actual arrays. Due to filtering, the indices might be non-consecutive, so we could just collect the values and ignore the indices. Though not necessary for the given use case, the array indices will be sorted to make the function more robust and support indices that are received in any order.

  • Recursion in PowerShell functions is comparatively slow, because of the parameter-binding overhead. If performance is paramount, the code should be rewritten in inline C# or use data structures like Collections.Queue to avoid recursion (at the expense of code readability).

  • Related