How to join tables selecting duplicates with condition-CodePudding

I have 2 tables

TicketHeaders TH

ticketID	Amount
1	600
2	900
3	400

TicketBody TB

ticketID	SellerName	SellerType
1	Karen	Manager
1	James	Trainee
2	John	Manager
3	James	Trainee

What I need is to get a table with TicketID - Amount - SellerName, but if I have a ticket with 2 sellers, I need to select only the manager for that particular ticket.

The output table should be:

ticketID	Amount	SellerName
1	600	Karen
2	900	John
3	400	James

If I use left join, I get duplicate amounts for ticket 1

SELECT TH.ticketID, TH.Amount, TB.SellerName
FROM TH
LEFT JOIN TB ON TH.ticketID = TB.ticketID

CodePudding user response：

SELECT TH.ticketID, TH.Amount, COALESCE(TB_M.SellerName, TB_T.SellerName)
FROM TH
LEFT JOIN TB TB_M ON TH.ticketID = TB_M.ticketID AND TB_M.SellerType = 'Manager'
LEFT JOIN TB TB_T ON TH.ticketID = TB_T.ticketID AND TB_T.SellerType <> 'Manager'

CodePudding user response：

Based on the stated version 2.5 it does appear that row_number() solutions would not be available to you. You could approach this with a single inner join. I don't know if there's possibly any benefit to avoiding the extra join in Firebird.

select ticketID, min(Amount) as Amount,
    case min(case SellerType when 'Manager' then 1 else 2 end) when 1
        then min(case SellerType =  'Manager' then SellerName end)
        else min(case SellerType <> 'Manager' then SellerName end)
    end SellerName
from TH th inner join TB tb on tb.ticketID = th.ticketID
group by ticketID

Another benefit is that this would work for a larger hierarchy of different sellers (by adding new cases.) It wouldn't work if there were multiple sellers at a single level though.

CodePudding user response：

I would use a row_number() in this scenario:

with _cte as (
     SELECT TH.ticketID, 
            TH.Amount, 
            TB.SellerName, 
            row_number() over (partition by TH.ticketID order by case when SellerType = 'Manager' then 0 else 1 end) as rn
     FROM TH
     LEFT JOIN TB ON TH.ticketID = TB.ticketID
)
select ticketID, Amount, SellerName
from _cte 
where rn = 1

CodePudding user response：

A.S. I think your problem is not well defined yet:

I need to select only the manager for that particular ticket

What if there would be many rows, but two or more managers? zero managers? Without thinking it through, like making warranties SQL server would never allow such data to be inserted, it is an error waiting to happen.

I also think SellerType should better be an integer field, a foreign key to some Seller_types dictionary table - both because integer fields are easier indexed and compared for joining tables, and because that would allow you to later rename "functional roles" as you please (or as you was bossed to do), without changing a thing: your Seller_types can have extra columns like integer role_priority or even something like max_persons_of_type_in_one_ticket (you bosses might, just for example, decide there can be two managers on one ticket, or a manager and a vice-manager, and then no more than four trainees).

But back to the question, there is one more approach to do it. It would de facto run a correlated sub-query for every row in TicketHeaders table, so it would probably be slower if you do "long" selects with thousands rows. Especially if you keep default small memory caches of Firebird (see articles on configuring Firebird and relaxed configs on ib-aid.com).

On the other hand, it akes to "do once and forget", making your queries simpler, thus less chance for you to err in future. And speed penalty would probably be unnoticeable on short (100 and less rows) queries. And, there would be no penalty if you would not query that new column at all (you do NOT use select * queries in production, do you?).

So, the code, finally: db<>fiddle here

create table Ticket_Headers (
  ticket_id integer primary key, 
  amount integer not null
)

create table Ticket_Body (
  ticket_id integer 
     REFERENCES Ticket_Headers(ticket_id) 
        ON DELETE CASCADE
        ON UPDATE CASCADE, 
  Seller_Name varchar(20) not null,
  Seller_Type varchar(20) not null,
  CONSTRAINT TicketBody_PK PRIMARY KEY (ticket_id, Seller_Name)
)

create index idx_TicketBody_Type on Ticket_Body(Seller_Type)

insert into Ticket_Headers
select 1, 600 from rdb$database union all
select 2, 900 from rdb$database union all
select 3, 400 from rdb$database

3 rows affected

insert into Ticket_body
select 1, 'Karen', 'Manager' from rdb$database union all
select 1, 'James', 'Trainee' from rdb$database union all
select 2, 'John',  'Manager' from rdb$database union all
select 3, 'James', 'Trainee' from rdb$database

4 rows affected

select * from Ticket_Headers

TICKET_ID | AMOUNT
--------: | -----:
        1 |    600
        2 |    900
        3 |    400

select * from Ticket_Body

TICKET_ID | SELLER_NAME | SELLER_TYPE
--------: | :---------- | :----------
        1 | Karen       | Manager    
        1 | James       | Trainee    
        2 | John        | Manager    
        3 | James       | Trainee

select * from Ticket_Headers TH, Ticket_Body TB where TH.ticket_id = TB.ticket_ID

TICKET_ID | AMOUNT | TICKET_ID | SELLER_NAME | SELLER_TYPE
--------: | -----: | --------: | :---------- | :----------
        1 |    600 |         1 | Karen       | Manager    
        1 |    600 |         1 | James       | Trainee    
        2 |    900 |         2 | John        | Manager    
        3 |    400 |         3 | James       | Trainee

alter table Ticket_Headers
  add Seller_Top computed by 
  ( -- this parenthesis is required by COMPUTED BY SQL syntax
   ( -- this parenthesis is required to coerce SELECT from query to expression 
    select First(1) TB.Seller_Name
      from Ticket_Body TB
     where TB.ticket_id = Ticket_Headers.ticket_id
    order by TB.Seller_type /* Descending - if other order to be needed */ 
   ) 
  )

select * from Ticket_Headers

TICKET_ID | AMOUNT | SELLER_TOP
--------: | -----: | :---------
        1 |    600 | Karen     
        2 |    900 | John      
        3 |    400 | James

The aforementioned Seller_types.role_priority would be much more flexible thus future-proof approach for such an order-by.