Home > Back-end >  Sql: Join separately ordered tables
Sql: Join separately ordered tables

Time:11-24

Let's assume I have two sets of events:

  • Foo
  • Bar

where I would always expect Bar to follow Foo: Foo -> Bar. I have a table of Foo values:

|----|---------------|------|
| id | ordering-foo  | other|
|----|---------------|------|
|1   |1              |X     |
|1   |2              |Y     |
|----|---------------|------|
|2   |1              |X     |
|----|---------------|------|
|3   |2              |X     |
|----|---------------|------|
|4   |1              |X     |
|4   |2              |Y     |
|----|---------------|------|

the ordering field indicates the order at which the Foo events happened per id.

I also have a set of Bar events:

|----|---------------|-------|
| id | ordering_bar  | other |
|----|---------------|-------|
|1   |A              |XX     |
|1   |B              |YY     |
|----|---------------|-------|
|3   |B              |XX     |
|----|---------------|-------|
|4   |A              |XX     |
|----|---------------|-------|

Note that:

  • while Foo and Bar are both ordered, they don't share the same ordering and we can't simply join them on the said ordering values. Here I have simplified them to numbers vs strings. In the problem that inspired this question, these are the timestamps for each Foo/Bar event respectively, which has the property of foo.ordering < bar.ordering for a Foo->Bar sequence of events, but that's probably not massively helpful to this problem.
  • The ordering isn't "???", ie just because we have an order entry of 2(B) doesn't mean we'd necessarily have a 1(A) entry. see entries for id: 3
  • It's possible for us to have a record for Foo but not the subsequent Bar, ie see entries for id: 2, 4

I want to end up with:

|----|----------|-----------|-----------|
| id | ordering | other-foo | other-bar |
| 1  | 1        | X         | XX        |
| 1  | 2        | Y         | YY        |           
|----|----------|-----------|-----------|
| 2  | 1        | X         | null      |           
|----|----------|-----------|-----------|
| 3  | 2        | X         | XX        |
|----|----------|-----------|-----------|
| 4  | 1        | X         | XX        |
| 4  | 2        | Y         | null      |           
|----|----------|-----------|-----------|

How can I get there? In my special case of this problem I only ever have two possible events per event type, per id. ie the ordering values can only ever be: 1,2 / A,B I played around with things like:

case
  when count(*) over (partition by foo.id) = 1 and count(*) over (partition by bar.id) = 1 then foo.ordering_foo
  when count(*) over (partition by foo.id) = 2 and count(*) over (partition by bar.id) = 1 then 1
  when count(*) over (partition by foo.id) = 2 and count(*) over (partition by bar.id) = 2 and max(bar.ordering_bar) over (partition by bar.id) = bar.ordering_bar then 2
  when count(*) over (partition by foo.id) = 2 and count(*) over (partition by bar.id) = 2 and min(bar.ordering_bar) over (partition by bar.odering_bar)= bar.ordering_bar then 1
  else -1
end as ordering, 

ie, I treat each case of:

  • 1 foo, 1 bar
  • 2 foo, 1 bar
  • 2 foo, 2 bar

separately to com up with a composite order. Tho it is likely error-prone, and most importantly I realise this is:

  • horrible to read/maintain
  • not flexible enough.
  • hard to use to get other fields.

So I'm curious if you could solve this more elegantly in the generic case.

CodePudding user response:

You may join the tables using ROW_NUMBER as the following:

SELECT T.id ,T.ordering_foo, T.other other_foo, D.other other_bar
FROM
(
  SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY ordering_foo) foo_rn
  FROM foo
) T
LEFT JOIN
(
  SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY ordering_bar) bar_rn
  FROM bar
) D
ON T.ID=D.ID AND T.foo_rn=D.bar_rn
ORDER BY T.id ,T.ordering_foo

See a demo on SQL Server.

  • Related