raw_data :
name | account_id | type | element_id | cost |
---|---|---|---|---|
First | 1 | type1 | element1 | 0.1 |
Second | 2 | type2 | element2 | 0.2 |
First | 11 | type2 | element11 | 0.11 |
components:
name | account_id (default = -1) | type (default = null) | element_id (default = null) | cost |
---|---|---|---|---|
First | -1 | null | null | 0.1 |
Second | 2 | type2 | null | 0.2 |
First | 11 | type2 | element11 | 0.11 |
I seek to check whether the cost logged in raw_data is the same as that in components for a given combination. They need to be joined on column name
.
Remaining fields in raw_data are always populated. In components, any row can be a combination of specific values and the default values.
I seek to match the columns from raw_data to components wherever I find a match and otherwise need to use the default value to get the cost
.
I failed with left join and union and IN.
E.g. For the first row in raw_data table with name "First", I do not have account_id = 1 in the components table. So I need to go with account_id = -1.
Match as many specific values as found in components, Otherwise resort to default values.
CodePudding user response:
If the ratio of records in the tables is not 1 to 1, then an unambiguous sample cannot be made. Also, if the selection condition is "Coincidence of at least one parameter", then it will also not work to make an unambiguous selection.
Below is an example that will bring you closer to solving the problem. It selects data that matches one of the selection criteria, however there may be duplicates!
Try this variant and report the result, possibly on a larger variant of samples and records
Maybe this will get you closer to the solution.
select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost
from raw_data rd
left join components c on rd.name = c.name
where (c.account_id = rd.account_id or c.account_id = -1) OR
(c.type = rd.type OR c.type is null) OR
(c.element_id = rd.element_id OR c.element_id = null)
You can build the priority of checking values through union
select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost
from raw_data rd
left join components c on rd.name = c.name
where c.account_id = rd.account_id and
c.type = rd.type
c.element_id = rd.element_id
union
select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost
from raw_data rd
left join components c on rd.name = c.name
where c.account_id = rd.account_id and
c.type = rd.type
union
select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost
from raw_data rd
join components c on rd.name = c.name
where c.account_id = rd.account_id
etc
Without seeing all the problems, all the data options in the tables, it is difficult to give the right solution, which may not be...
CodePudding user response:
I think one way you could do this is something like:
SELECT *
FROM
(
SELECT rd.name, rd.account_id, rd.type, rd.element_id, rd.cost raw_cost, c.account_id component_account_id, c.type component_type, c.element_id component_element_id, c.cost component_cost,
row_number() OVER (PARTITION BY rd.name, rd.account_id, rd.type, rd.element_id
ORDER BY
CASE WHEN c.account_id <> -1 THEN 1 END
CASE WHEN c.type IS NOT NULL THEN 1 END
CASE WHEN c.element_id IS NOT NULL THEN 1 END DESC) rd
FROM raw_data rd LEFT OUTER JOIN components c
ON rd.name = c.name
AND (rd.account_id = c.account_id or c.account_id = -1)
AND (rd.type = c.type OR c.type IS NULL)
AND (rd.element_id = c.element_id OR c.element_id IS NULL)
) iq
WHERE rd = 1
The idea here is to match on an actual match or the default. Then the row_number
window function is used to prioritize the matches based on a count of how many columns actually matched (you said you don't care about ties, so this doesn't handle that). The outer query throws away the matches that aren't the best.
With the sample data above, this could be an inner join
, but I left it as a left join
since that's what was mentioned.
Here's a fiddle of it working. Hopefully this is close to what you want.