Need to figure out an efficient way to query a table using another table as a filter/config (postgres 14.5).

The filter table has 4 levels that match 4 levels in the data. Each level can be read as a RegExp with wildcard, where wildcard is null. But the rules are MUTUALLY EXCLUSIVE. meaning rule aa-* excludes the rule aa-ab-* and vice versa.

e.g. Filter Table

ID	Category	Level 1	Level 2	Level 3	Level 4
Rule1	A	aa	null	null	null
Rule2	A	aa	ab	null	null
Rule3	A	ab	null	null	null
Rule4	A	ab	ac	aa	null

In this case filter rule #1 is matching all aa-* data, except when it's aa-ab-* (rule #2) Similarly, rule #3 will match ab-* data, except ab-ac-aa-*

e.g. Data Table

Data	Category	Level 1	Level 2	Level 3	Level 4
Data1	A	aa	aa	ac	aa
Data2	A	aa	aa	null	null
Data3	A	aa	ab	null	null
Data4	A	ab	ab	null	null
Data5	A	ab	ac	null	null
Data6	A	ab	ac	dd	null

This way the join between the two should produce result like:

Data ID	Rule ID
Data1	Rule1
Data2	Rule1
Data3	Rule2
Data4	Rule3
Data5	Rule3
Data6	Rule3

Approach toward combining all levels into a string with "level-combo" seem to be inefficient and has gaps. The join based on coalesce(lvl, '') is not helpful either, it's not addressing the wildcard rules.

where coalesce(a.lvl1,'') = coalesce(c.lvl1,'')
and coalesce(a.lvl2,'') = coalesce(c.lvl2,'')
and coalesce(a.lvl3,'') = coalesce(c.lvl3,'')
and coalesce(a.lvl4,'') = coalesce(c.lvl4,'')

DATA to replicate:

WITH config (id, category, lvl1, lvl2, lvl3, lvl4) AS (
  VALUES
   (1, 's', null,  null, null, null ),
   (2, 's', 'u7', null,  null, null ),
   (3, 's', 'u6', 'u1',  null, null ),
   (4, 's', 'u5', 'ud',  'u2', null ),
   (5, 's', 'u5', 'ud',  'u3', null ),
   (6, 's', 'u5', 'ud',  'u4', 'ok' ),
   (9, 's', 'u4', null,  null, null ),
   (7, 's', 'u4', 'u1',  'u2', 'u3' ),
   (8, 's', 'u4', 'cu',  'u2', null )
),
datum (id, data_id, internal_id, start_date, end_date, category, lvl1, lvl2, lvl3, lvl4) AS (
  VALUES
   (1,  'x1', '111',  '2022-01-01', '2022-12-01', 's', null,  null, null, null ),
   (2,  'x2', '112',  '2022-01-01', '2022-12-01', 's', 'u7', null,  null, null ),
   (3,  'x3', '113',  '2022-01-01', '2022-12-01', 's', 'u6', 'u1',  null, null ),
   (4,  'x4', '114',  '2022-01-01', '2022-12-01', 's', 'u5', 'ud',  'u2', null ),
   (5,  'x5', '115',  '2022-01-01', '2022-12-01', 's', 'u5', 'ud',  'u3', null ),
   (6,  'x6', '116',  '2022-01-01', '2022-12-01', 's', 'u5', 'ud',  'u4', 'ok' ),
   (9,  'x9', '119',  '2022-01-01', '2022-12-01', 's', 'u4', null,  null, null ),
   (7,  'x7', '117',  '2022-01-01', '2022-12-01', 's', 'u4', 'u1',  'u2', 'u3' ),
   (8,  'x8', '118',  '2022-01-01', '2022-12-01', 's', 'u4', 'cu',  'u2', null ),
   (9,  'x2', '112',  '2022-01-01', '2022-12-01', 's', 'u9', null,  null, null ),
   (10, 'x3', '113',  '2022-01-01', '2022-12-01', 's', 'u5', 'u1',  null, null ),
   (11, 'x4', '114',  '2022-01-01', '2022-12-01', 's', 'u5', 'dd',  'u2', null ),
   (12, 'x5', '115',  '2022-01-01', '2022-12-01', 's', 'u5', 'ud',  'u3', 'ck' ),
   (13, 'x6', '116',  '2022-01-01', '2022-12-01', 's', 'u5', 'ud',  'u4', 'no' )
  )
SELECT
   *
FROM config c
join datum d on c.category = d.category
and coalesce(c.lvl1, '') = coalesce(d.lvl1, '')
and ...  ;

CodePudding user response：

String concatenation maybe? (Note this is SSMS syntax, I don't know postgresql)

SELECT d.dataID, r.ruleID
FROM data d
LEFT JOIN rule r ON 

d.Level1   
COALESCE(d.Level2   '-')   
COALESCE(d.Level3   '-')   
COALESCE(d.Level4   '-') 

LIKE 
--% is any number of characters
'%'   r.Level1   
COALESCE(r.Level2   '-')   
COALESCE(r.Level3   '-')   
COALESCE(r.Level4   '-')   '%'

This will return all matches, so you'll also need to order your rules by priority, group the matches by dataID and return the minimum rule.

CodePudding user response：

This will only work if id is unique.

your second datum table has a double 9

This will sum up all hits between both tables and the highest score for a c id wins

WITH config (id, category, lvl1, lvl2, lvl3, lvl4) AS (
  VALUES
   (1, 's', null,  null, null, null ),
   (2, 's', 'u7', null,  null, null ),
   (3, 's', 'u6', 'u1',  null, null ),
   (4, 's', 'u5', 'ud',  'u2', null ),
   (5, 's', 'u5', 'ud',  'u3', null ),
   (6, 's', 'u5', 'ud',  'u4', 'ok' ),
   (9, 's', 'u4', null,  null, null ),
   (7, 's', 'u4', 'u1',  'u2', 'u3' ),
   (8, 's', 'u4', 'cu',  'u2', null )
),
datum (id, data_id, internal_id, start_date, end_date, category, lvl1, lvl2, lvl3, lvl4) AS (
  VALUES
   (1,  'x1', '111',  '2022-01-01', '2022-12-01', 's', null,  null, null, null ),
   (2,  'x2', '112',  '2022-01-01', '2022-12-01', 's', 'u7', null,  null, null ),
   (3,  'x3', '113',  '2022-01-01', '2022-12-01', 's', 'u6', 'u1',  null, null ),
   (4,  'x4', '114',  '2022-01-01', '2022-12-01', 's', 'u5', 'ud',  'u2', null ),
   (5,  'x5', '115',  '2022-01-01', '2022-12-01', 's', 'u5', 'ud',  'u3', null ),
   (6,  'x6', '116',  '2022-01-01', '2022-12-01', 's', 'u5', 'ud',  'u4', 'ok' ),
   (7,  'x9', '119',  '2022-01-01', '2022-12-01', 's', 'u4', null,  null, null ),
   (8,  'x7', '117',  '2022-01-01', '2022-12-01', 's', 'u4', 'u1',  'u2', 'u3' ),
   (9,  'x8', '118',  '2022-01-01', '2022-12-01', 's', 'u4', 'cu',  'u2', null ),
   (10,  'x2', '112',  '2022-01-01', '2022-12-01', 's', 'u9', null,  null, null ),
   (11, 'x3', '113',  '2022-01-01', '2022-12-01', 's', 'u5', 'u1',  null, null ),
   (12, 'x4', '114',  '2022-01-01', '2022-12-01', 's', 'u5', 'dd',  'u2', null ),
   (13, 'x5', '115',  '2022-01-01', '2022-12-01', 's', 'u5', 'ud',  'u3', 'ck' ),
   (14, 'x6', '116',  '2022-01-01', '2022-12-01', 's', 'u5', 'ud',  'u4', 'no' )
  ),
  SUM_lvl as (
SELECT
   c.id as C_id,d.id as d_id,
  (coalesce(c.lvl1, '')  = coalesce(d.lvl1, ''))::int  
    (coalesce(c.lvl2, '')  = coalesce(d.lvl2, ''))::int  
  (coalesce(c.lvl3, '')  = coalesce(d.lvl3, ''))::int  
    (coalesce(c.lvl3, '')  = coalesce(d.lvl3, ''))::int  as level
FROM config c
join datum d on c.category = d.category)
, selhigh as (SELECT 
c_id,d_id,level,
  ROW_NUMBER() OVER(PARTITION BY c_id ORDER BY level DESC,d_id ASC) rn
FROM SUM_lvl)
SELECT c_id,d_id FROM selhigh WHERE rn = 1

c_id	d_id
1	1
2	2
3	3
4	4
5	5
6	6
7	8
8	9
9	7

SELECT 9

fiddle