Home > OS >  How to prevent ConvertFrom-Json from collapsing nested arrays if there's 1 element
How to prevent ConvertFrom-Json from collapsing nested arrays if there's 1 element

Time:07-20

Consider these two JSON files:

{
  "tables": [
    {
      "columns": [
        {
          "name": "action",
          "type": "string"
        },
        {
          "name": "region",
          "type": "string"
        },
        {
          "name": "count_",
          "type": "long"
        },
        {
          "name": "min_timestamp",
          "type": "datetime"
        },
        {
          "name": "max_timestamp",
          "type": "datetime"
        }
      ],
      "name": "PrimaryResult",
      "rows": [
        [
          "ChangePassword",
          "CN",
          1,
          "2022-07-19T11:51:52.8430729Z",
          "2022-07-19T11:51:52.8430729Z"
        ]
      ]
    }
  ]
}
{
  "tables": [
    {
      "columns": [
        {
          "name": "action",
          "type": "string"
        },
        {
          "name": "region",
          "type": "string"
        },
        {
          "name": "count_",
          "type": "long"
        },
        {
          "name": "min_timestamp",
          "type": "datetime"
        },
        {
          "name": "max_timestamp",
          "type": "datetime"
        }
      ],
      "name": "PrimaryResult",
      "rows": [
        [
          "ChangePassword",
          "CN",
          1,
          "2022-07-19T11:51:52.8430729Z",
          "2022-07-19T11:51:52.8430729Z"
        ],
        [
          "Register",
          "CN",
          1,
          "2022-07-19T11:51:52.8430729Z",
          "2022-07-19T11:51:52.8430729Z"
        ]
      ]
    }
  ]
}

They have the exact same schema and are almost the same. The difference is that first file has 1 row in the rows property, the second file has 2 rows in the rows property.


I use ConvertFrom-Json to load the files into PowerShell. When I start accessing .tables.rows[0], I expect that it will return the first row.

However, in the first file that has only 1 row, it actually returns the first column of the first row.

(gc file1.json | ConvertFrom-Json).tables.rows[0]

Outputs:

ChangePassword

If there are more than 1 row as in the second file, then the .tables.rows[0] behaves as expected.

(gc file2.json | ConvertFrom-Json).tables.rows[0]

Outputs:

ChangePassword
CN
1
2022-07-19T11:51:52.8430729Z
2022-07-19T11:51:52.8430729Z

How can I reliably process these JSON files in PowerShell regardless of if they have 1 row or multiple rows in them?

btw. These are actually JSON files produced by az cli as results from Azure Application Insights queries.

CodePudding user response:

tl;dr

You need to avoid member-access enumeration if you want to access collection-valued properties as-is, which in your case means the following, as nimizen demonstrated in a comment:

# Note the [0] after .tables
# (-Raw was added for efficiency)
(gc -Raw file1.json | ConvertFrom-Json).tables[0].rows[0]

Because your tables property is itself an array, accessing just .tables results in member-access enumeration (even though the array happens to have just one element).
Using index [0] to target that one element explicitly allows access to its (one and only) .rows property without the latter's value being subject to implicit enumeration, as explained in the next section.


You're seeing a surprising aspect of PowerShell's member-access enumeration feature:

The values obtained from the members of a collection's elements are emitted as if they were sent to the pipeline, one by one, which has the following implications:

  • As happens by default in the pipeline, any such value is enumerated if it happens to be collection-valued, i.e. its elements are emitted, one by one.

  • On accessing the resulting output, if there's only one output object (whether or not that happens to be a collection itself), it is captured as-is; if there are two or more, they are implicitly collected in a regular PowerShell array (of type [object[]]).

The upshot for enumerated member values that happen to contain collections is:

  • The input collections themselves are always lost, because only their elements are output.

  • If there are multiple collection-valued member values, you'll end up with a single, flat array, invariably of type [object[]], that is the concatenation of the elements of all collections involved.

  • If there is only one collection-valued member value, and that value happens to be a single-element collection, you'll end up with that single element as-is (only with multiple elements would you end up with an [object[]] array).

This surprising behavior is the subject of GitHub issue #6802, though note that the behavior is unlikely to change, so as not to break backward compatibility.


A simplified example to demonstrate what happened in your case:

Note: JSON is incidental to the problem, so I'm using a [pscustomobject] instance below.

# Construct a single-element array that has a [pscustomobject] element.
# whose .prop property contains a single-element array whose only
# element is another array, typed [int[]].
$array = , [pscustomobject] @{ prop = , [int[]] (1, 2) }

# Use member-access enumeration to access the .prop member value
# of all elements in the array (here, there's only one), and get
# the first element.
$array.prop[0] # !! -> scalar 1, i.e. the first elem. of the *nested* array,
               # !! because member-access enumeration *enumerated* the 
               # !! single-element outer array.

# Without member-access enumeration, the value of .prop is accessed as-is,
# as a nested array:
$array[0].prop[0] # -> [int[]] (1, 2), as expected.

CodePudding user response:

don't know how to stop powershell behaving that way, but got it "reliable" by

$test=Get-Item ".\file.json" | get-content | convertfrom-json
if($test.tables.rows[0].count -gt 1 ){
    $result = $test.tables.rows[0][0] 
}else{
    $result = $test.tables.rows[0]
}

  • Related