Understanding jq JOIN()-CodePudding

I'm trying to understand JOIN() builtin of jq.

From the jq manual (https://stedolan.github.io/jq/manual):

JOIN($idx; stream; idx_expr; join_expr):

This builtin joins the values from the given stream to the given index. 
The index's keys are computed by applying the given index expression to each value from the given stream. 
An array of the value in the stream and the corresponding value from the index is fed to the given join expression to produce each result.

I find this hard to understand without examples.

Could you give some examples of its usage demonstrating how it works?

CodePudding user response：

This function should resemble a JOIN clause in SQL. It is used to combine rows from two (or more in SQL) tables, based on a related column between them.

Let's build some "tables".

The first one should be a list of orders with an ID, and ID references to the ordering Customer and the ordered Product:

[
  {
    "OrderID": "10",
    "CustomerIDRef": "2",
    "ProductIDRef": "7"
  },
  {
    "OrderID": "11",
    "CustomerIDRef": "1",
    "ProductIDRef": "7"
  },
  {
    "OrderID": "12",
    "CustomerIDRef": "2",
    "ProductIDRef": "14"
  },
  {
    "OrderID": "13",
    "CustomerIDRef": "2",
    "ProductIDRef": "7"
  }
]
as $orders

Let the second one be the list of customers mapped to their name:

[
  {
    "CustomerID": "1",
    "CustomerName": "Alfred"
  },
  {
    "CustomerID": "2",
    "CustomerName": "Bill"
  },
  {
    "CustomerID": "3",
    "CustomerName": "Caroline"
  }
]
as $customers

As jq's JOIN only handles two tables at once (for more, you need to cascade), let's ignore the missing Products table.

Before we can get to JOIN we need to look at INDEX first, which turns an array like our tables from above into an object with the table's "primary keys" as field names. This is reasonable as field names are unique, rendering a lookup to always return not more than one record.

INDEX($customers[]; .CustomerID)

{
  "1": {
    "CustomerID": "1",
    "CustomerName": "Alfred"
  },
  "2": {
    "CustomerID": "2",
    "CustomerName": "Bill"
  },
  "3": {
    "CustomerID": "3",
    "CustomerName": "Caroline"
  }
}

Demo

Now, we can easily perform a JOIN operation between Orders (as the "left table") and their Customers (as the "right table"). Providing the "right table" as an INDEXed object, the "left table" as a stream .[], and the "related column" as field in the left table's objects that is matched with the right table's primary key (field name in the lookup object), we get: (let the last parameter be just . for now)

JOIN(INDEX($customers[]; .CustomerID); $orders[]; .CustomerIDRef; .)

[
  {
    "OrderID": "10",
    "CustomerIDRef": "2",
    "ProductIDRef": "7"
  },
  {
    "CustomerID": "2",
    "CustomerName": "Bill"
  }
]
[
  {
    "OrderID": "11",
    "CustomerIDRef": "1",
    "ProductIDRef": "7"
  },
  {
    "CustomerID": "1",
    "CustomerName": "Alfred"
  }
]
[
  {
    "OrderID": "12",
    "CustomerIDRef": "2",
    "ProductIDRef": "14"
  },
  {
    "CustomerID": "2",
    "CustomerName": "Bill"
  }
]
[
  {
    "OrderID": "13",
    "CustomerIDRef": "2",
    "ProductIDRef": "7"
  },
  {
    "CustomerID": "2",
    "CustomerName": "Bill"
  }
]

Demo

As you can see, we get a stream of arrays, one for each order. Each array has two elements: the record from the left table and the one from the right. An unsuccessful lookup would yield null on the right side.

Finally, the fourth parameter being the "join expression" describes how to join the two matching records, which esentially acts as a map.

JOIN(INDEX($customers[]; .CustomerID); $orders[]; .CustomerIDRef;
  "\(.[0].OrderID): \(.[1].CustomerName) ordered Product #\(.[0].ProductIDRef)."
)

10: Bill ordered Product #7.
11: Alfred ordered Product #7.
12: Bill ordered Product #14.
13: Bill ordered Product #7.

Demo