Home > other >  semantics of map on a sequence of objects in jq
semantics of map on a sequence of objects in jq

Time:02-12

Suppose I have a file fruit.json containing the following lines:

[
  {
    "name": "apple",
    "color": "red",
    "price": 20
  },
  {
    "name": "banana",
    "color": "yellow",
    "price": 15
  },
  {
    "name": "pineapple",
    "color": "orange",
    "price": 53
  }
]

If I do jq '. | map(.)' fruit.json then I get the original data. That's expected. The second . refers to an element in the entire array.

However if I do jq '.[] | map(.)' fruit.json then I get this:

[
  "apple",
  "red",
  20
]
[
  "banana",
  "yellow",
  15
]
[
  "pineapple",
  "orange",
  53
]

Can someone please explain what's going on? Specifically,

  1. The [] after . strips away the brackets from the input array. Do we have a name for the [] operator? The manual seems to treat it as something very basic without definition.
  2. Do we have a name for the resulting thing by appending [] to .? Obviously it's not an object. If we do jq '.[]' fruit.json we can see that it looks very similar to an array. But apparently it behaves quite differently.
  3. Why is it the case that the map function seems to go two levels inside instead of one? This is more obvious if we do jq '.[] | map(. | length)' fruit.json and see that the . inside the map function refers to the value part of an (object) element of the input array.

Thank you all in advance!

CodePudding user response:

.[] produces the values of the array or object given to it.

For example,

[ "a", "b", "c" ] | .[]

is equivalent to

[ "a", "b", "c" ] | .[0], .[1], .[2]

and produces three strings: a, b and c.


map( ... )

is equivalent to

[ .[] | ... ]

This means that

map( . )    ≡    [ .[] | . ]    ≡    [ .[] ]

For an array, that means

map( . )    ≡    [ .[0], .[1], ... ]    ≡    .

For an object, that means

map( . )    ≡    [ .["key1"], .["key2"], ... ]

The [] after . strips away the brackets from the input array.

There are no brackets. jq programs don't deal with JSON text, but the data structure it represents.

When given an array or object, .[] produces the values of the elements of that array or object.

Do we have a name for the [] operator?

The docs call it the Array/Object Value Iterator, but it's really just a specific usage of the indexing operator.

The Array/Object Value Iterator is ascribed to .[] in the docs, but that's not accurate. It doesn't have to be . before it, but an expression must precede it. This distinguishes it from array construction operator.

In technical terms,

  • [] as a circumfix operator ([ EXPR ]) is the array construction operator, and
  • [] as a postfix operator (EXPR [ EXPR? ]) is the indexing operator, and it's specifically called the the array/object value iterator when there's nothing in the brackets.

Do we have a name for the resulting thing by appending [] to .? Obviously it's not an object. If we do jq '.[]' fruit.json we can see that it looks very similar to an array. But apparently it behaves quite differently.

We call that a stream.

I'm not sure what to call the components of the stream. I usually use "value".

For example,

"a", "b", "c"       // Produces a stream of three values.
"abc" / "" | .[]    // Same

When serialized to a file with one value per line (as you would get using -c), it's called "JSON lines" with a suggested naming convention of .jsonl.

Why is it the case that the map function seems to go two levels inside instead of one? This is more obvious if we do jq '.[] | map(. | length)' fruit.json and see that the . inside the map function refers to the value part of an (object) element of the input array.

No, just one.

In that example,

  • The .[] iterates over the values of the array.
  • The map iterates over the values of the objects.
  • Related