Match specific or default value on multiple columns-CodePudding

raw_data :

name	account_id	type	element_id	cost
First	1	type1	element1	0.1
Second	2	type2	element2	0.2
First	11	type2	element11	0.11

components:

name	account_id (default = -1)	type (default = null)	element_id (default = null)	cost
First	-1	null	null	0.1
Second	2	type2	null	0.2
First	11	type2	element11	0.11

I seek to check whether the cost logged in raw_data is the same as that in components for a given combination. They need to be joined on column name. Remaining fields in raw_data are always populated. In components, any row can be a combination of specific values and the default values.

I seek to match the columns from raw_data to components wherever I find a match and otherwise need to use the default value to get the cost. I failed with left join and union and IN.

E.g. For the first row in raw_data table with name "First", I do not have account_id = 1 in the components table. So I need to go with account_id = -1.

Match as many specific values as found in components, Otherwise resort to default values.

CodePudding user response：

If the ratio of records in the tables is not 1 to 1, then an unambiguous sample cannot be made. Also, if the selection condition is "Coincidence of at least one parameter", then it will also not work to make an unambiguous selection.

Below is an example that will bring you closer to solving the problem. It selects data that matches one of the selection criteria, however there may be duplicates!

Try this variant and report the result, possibly on a larger variant of samples and records

Maybe this will get you closer to the solution.

select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost 
from raw_data rd
left join components c on rd.name = c.name
where (c.account_id = rd.account_id or c.account_id = -1) OR
(c.type = rd.type OR c.type is null) OR
(c.element_id = rd.element_id OR c.element_id = null)

You can build the priority of checking values through union

select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost 
from raw_data rd
left join components c on rd.name = c.name
where c.account_id = rd.account_id and 
c.type = rd.type
c.element_id = rd.element_id
union
select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost 
from raw_data rd
left join components c on rd.name = c.name
where c.account_id = rd.account_id and 
c.type = rd.type
union
select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost 
from raw_data rd
join components c on rd.name = c.name
where c.account_id = rd.account_id
etc

Without seeing all the problems, all the data options in the tables, it is difficult to give the right solution, which may not be...

CodePudding user response：

I think one way you could do this is something like:

SELECT *
FROM
(
SELECT rd.name, rd.account_id, rd.type, rd.element_id, rd.cost raw_cost, c.account_id component_account_id, c.type component_type, c.element_id component_element_id, c.cost component_cost,
  row_number() OVER (PARTITION BY rd.name, rd.account_id, rd.type, rd.element_id
   ORDER BY 
    CASE WHEN c.account_id <> -1 THEN 1 END
      CASE WHEN c.type IS NOT NULL THEN 1 END
      CASE WHEN c.element_id IS NOT NULL THEN 1 END DESC) rd
FROM raw_data rd LEFT OUTER JOIN components c
  ON rd.name = c.name
   AND (rd.account_id = c.account_id or c.account_id = -1)
   AND (rd.type = c.type OR c.type IS NULL)
   AND (rd.element_id = c.element_id OR c.element_id IS NULL)
) iq
WHERE rd = 1

The idea here is to match on an actual match or the default. Then the row_number window function is used to prioritize the matches based on a count of how many columns actually matched (you said you don't care about ties, so this doesn't handle that). The outer query throws away the matches that aren't the best.

With the sample data above, this could be an inner join, but I left it as a left join since that's what was mentioned.

Here's a fiddle of it working. Hopefully this is close to what you want.