Let's assume I have two sets of events:
Foo
Bar
where I would always expect Bar
to follow Foo
: Foo -> Bar
. I have a table of Foo
values:
|----|---------------|------|
| id | ordering-foo | other|
|----|---------------|------|
|1 |1 |X |
|1 |2 |Y |
|----|---------------|------|
|2 |1 |X |
|----|---------------|------|
|3 |2 |X |
|----|---------------|------|
|4 |1 |X |
|4 |2 |Y |
|----|---------------|------|
the ordering
field indicates the order at which the Foo
events happened per id.
I also have a set of Bar
events:
|----|---------------|-------|
| id | ordering_bar | other |
|----|---------------|-------|
|1 |A |XX |
|1 |B |YY |
|----|---------------|-------|
|3 |B |XX |
|----|---------------|-------|
|4 |A |XX |
|----|---------------|-------|
Note that:
- while
Foo
andBar
are both ordered, they don't share the same ordering and we can't simply join them on the said ordering values. Here I have simplified them to numbers vs strings. In the problem that inspired this question, these are the timestamps for each Foo/Bar event respectively, which has the property offoo.ordering < bar.ordering
for aFoo->Bar
sequence of events, but that's probably not massively helpful to this problem. - The ordering isn't "???", ie just because we have an order entry of
2
(B
) doesn't mean we'd necessarily have a1
(A
) entry. see entries for id:3
- It's possible for us to have a record for
Foo
but not the subsequentBar
, ie see entries for id:2, 4
I want to end up with:
|----|----------|-----------|-----------|
| id | ordering | other-foo | other-bar |
| 1 | 1 | X | XX |
| 1 | 2 | Y | YY |
|----|----------|-----------|-----------|
| 2 | 1 | X | null |
|----|----------|-----------|-----------|
| 3 | 2 | X | XX |
|----|----------|-----------|-----------|
| 4 | 1 | X | XX |
| 4 | 2 | Y | null |
|----|----------|-----------|-----------|
How can I get there? In my special case of this problem I only ever have two possible events per event type, per id. ie the ordering values can only ever be: 1,2 / A,B
I played around with things like:
case
when count(*) over (partition by foo.id) = 1 and count(*) over (partition by bar.id) = 1 then foo.ordering_foo
when count(*) over (partition by foo.id) = 2 and count(*) over (partition by bar.id) = 1 then 1
when count(*) over (partition by foo.id) = 2 and count(*) over (partition by bar.id) = 2 and max(bar.ordering_bar) over (partition by bar.id) = bar.ordering_bar then 2
when count(*) over (partition by foo.id) = 2 and count(*) over (partition by bar.id) = 2 and min(bar.ordering_bar) over (partition by bar.odering_bar)= bar.ordering_bar then 1
else -1
end as ordering,
ie, I treat each case of:
- 1 foo, 1 bar
- 2 foo, 1 bar
- 2 foo, 2 bar
separately to com up with a composite order. Tho it is likely error-prone, and most importantly I realise this is:
- horrible to read/maintain
- not flexible enough.
- hard to use to get
other
fields.
So I'm curious if you could solve this more elegantly in the generic case.
CodePudding user response:
You may join the tables using ROW_NUMBER
as the following:
SELECT T.id ,T.ordering_foo, T.other other_foo, D.other other_bar
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY ordering_foo) foo_rn
FROM foo
) T
LEFT JOIN
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY ordering_bar) bar_rn
FROM bar
) D
ON T.ID=D.ID AND T.foo_rn=D.bar_rn
ORDER BY T.id ,T.ordering_foo
See a demo on SQL Server.