I have 2 tables which I would like to query and display data differences:
CREATE TABLE order_splits_config (
id INT,
pair_id INT
);
CREATE TABLE active_pairs (
id INT,
pair VARCHAR(30),
exchange_active boolean,
exchange_id INT
);
INSERT INTO order_splits_config(id, pair_id)
VALUES (1, 83);
INSERT INTO order_splits_config(id, pair_id)
VALUES (2, 58);
INSERT INTO order_splits_config(id, pair_id)
VALUES (34, 34);
INSERT INTO active_pairs(id, pair, exchange_active, exchange_id)
VALUES (1, 'US/EN', true, 2);
INSERT INTO active_pairs(id, pair, exchange_active, exchange_id)
VALUES (2, 'GB/UK', true, 3);
INSERT INTO active_pairs(id, pair, exchange_active, exchange_id)
VALUES (2, 'FR/EU', true, 4);
I use this query to query the differences:
SELECT b.id, b.pair, b.exchange_id
FROM order_splits_config a
FULL OUTER JOIN active_pairs b
ON a.pair_id = b.id
WHERE a.pair_id IS NULL
OR b.id IS NULL
AND b.exchange_active = 'true';
This prints a lot of lines like this for example (data is just for example):
#,pair,id,exchange_id
1, US/EN,332,1
2, GB/UK,112,1
3, GB/UK,113,1
4, FR/EU,221,5
5, FR/EU,183,2
...
How can I use DISTINCT
in order to get the result from the query unique by pair
?
FIDDLE: https://www.db-fiddle.com/f/4D6VfqysPCWhQnh8zaFBps/2
CodePudding user response:
You only select columns from table active_pairs
to begin with.
After joining on a.pair_id = b.id
, the condition a.pair_id IS NULL
is only true if there is no matching row in table order_splits_config AS a
, which is more efficiently formulated as NOT EXISTS
instead of the FULL JOIN
.
As for b.id IS NULL AND b.exchange_active = 'true'
: again, b.id
can only be NULL
for rows of a
where no matching b
is found - in which case b.exchange_active
can never be true. So the clause effectively eliminates all rows of a
that were just added by the FULL JOIN
and which would show up in the result as all NULL values since there are only columns from b
. In short: don't FULL JOIN
to begin with.
If active_pairs.pair
is defined UNIQUE
, it boils down to:
SELECT id, pair, exchange_id
FROM active_pairs b
WHERE NOT EXISTS (SELECT FROM order_splits_config a WHERE a.pair_id = b.id);
If active_pairs.pair
is not defined UNIQUE
, the simple solution is with DISTINCT ON
:
SELECT DISTINCT ON (pair)
id, pair, exchange_id
FROM active_pairs b
WHERE NOT EXISTS (SELECT FROM order_splits_config a WHERE a.pair_id = b.id);
db<>fiddle here
From each set of duplicates on pair
, you get an arbitrary pick.
For a deterministic pick, define what to pick and add an ORDER BY
clause accordingly. Example: to get the one with the smallest id
, add:
...
ORDER BY pair, id;
Also adds a sort order to the so far unsorted result.
If order_splits_config
isn't trivially small, have an index on (pair_id)
to make this fast.
There may be much faster solutions for a big active_pairs
table, depending on undisclosed information, mostly the cardinality of column pair
(how many duplicate values).
See: