I have 2 Tables. Table 1 holds ID1 and ID2. Table 2 holds ID2 and ID3.
Table 1 has unique cases for ID1 and multiple cases for ID2.
TABLE 1:
ID1 | ID2
1 1
2 2
3 3
4 3
5 4
6 5
7 5
8 6
9 7
10 6
Table 2 has unique cases for ID2 and multiple cases for ID3
TABLE 2:
ID2 | ID3
1 1
2 1
3 2
4 3
5 2
6 4
7 5
I want 1 unique case of ID3.
I need remove duplicate ID2s from Table 1 picking to remove the duplicate ID2s based on the smaller ID1
So Table 1 now looks like:
TABLE 1:
ID1 | ID2
1 1
2 2
4 3
5 4
7 5
9 7
10 6
Now I want to go to Table 2 and remove any duplicate ID3s based on the smaller ID2
TABLE 2:
ID2 | ID3
2 1
4 3
5 2
6 4
7 5
So my end result should be (I am joining the tables because both of them have other relevant information I need to combine but these are the IDs I am sorting and filtering to get the correct row):
Final Table:
ID1 | ID2 | ID3
2 2 1
7 5 2
5 4 3
10 6 4
9 7 5
Where now I have a single case for each ID3 based on the largest ID1 and ID2 associated with that ID3.
I have tried creating subqueries in the WHERE function to remove the duplicates but my understanding of SQL is not good enough to really figure out what is happening.
Group By and DISTINCT does not work for this case.
I added a Decision Tree to help visualize the problem. Essentially, each ID3 can potentially have multiple ID2s, which can potentially have multiple ID1s. I want to keep only the largest ID1, which gives me the correct ID2 associated with that ID3.
CodePudding user response:
with t1 as (
select ID1, ID2
from
(
select *
,row_number() over(partition by ID2 order by ID1 desc) as rn
from t
) t
where rn = 1
),
t3 as (
select ID2, ID3
from
(
select *
,row_number() over(partition by ID3 order by ID2 desc) as rn
from t2
) t
where rn = 1
)
select t1.ID1
,t1.ID2
,t3.ID3
from t1 join t3 on t3.ID2 = t1.ID2
order by ID3
ID1 | ID2 | ID3 |
---|---|---|
2 | 2 | 1 |
7 | 5 | 2 |
5 | 4 | 3 |
10 | 6 | 4 |
9 | 7 | 5 |