Home > Software design >  How do I self join this table multiple times?
How do I self join this table multiple times?

Time:07-12

I have a table that lists the following columns:

employee ssn, dependent ssn, last name, first name, relationship, benefit amount

in the relationship column, it will show either employee, child, or spouse.

What I need is the following:

employee ssn, last name, first name, ee benefit, child benefit, spouse benefit.

I need to join the benefit amount column where the employee ssn matches up, but the problem is whenever I do this, it gives me multiples of the same employee, and sometimes the child and/or spouse benefit isn't correct.

Here's what I've written:

SELECT a.[employee ssn], a.[last name], a.[first name], a.[benefit amount], b.[benefit amount] AS "child benefit", c.[benefit amount] AS "spouse benefit"
FROM allenrollments a
JOIN allenrollments b ON b.[employee ssn] = a.[employee ssn]
JOIN allenrollments c ON c.[employee ssn] = a.[employee ssn]
WHERE a.relationship = "Employee" AND b.relationship = "Child" AND c.relationship = "Spouse"

CodePudding user response:

You are getting a common problem of a cross-join or Cartesian product. For every record in the first table joined to the second is also combined with the third (in this case based on the common employee SSN.

Note, many systems try to do away with SSN as a primary key for many reasons and typically have auto-increment number and only use the SSN as the basis of a lookup to the details. Otherwise, you have SSN flooded throughout your system, and exposure if hacked is not good (although getting hacked is never a good thing when by a bad/malicious actor).

Now, that said and what you appear to have. You are trying to get data for a spouse and also child. But what if a family insurance benefit plan has 5 children. What is your final goal? What you might just be better off with is getting all rows and just having a column indicating WHO it is associated with. Then you can do whatever count/sum coverage checks once you get the data for the entire family plan after.

Also, having spaces within column names it not a good thing. You will always be chasing unbalanced [brackets] to contend with, output results queried to a class structure such as getting a list of things, etc. Having said all that, I would recommend trying

SELECT 
      emp.[employee ssn] SSN, 
      emp.[last name] EmpLastName, 
      emp.[first name] EmpFirstName, 
      emp.[benefit amount] EmpBenefitAmount,     
      fam.[last name] FamilyMemberLastName, 
      fam.[first name] FamilyMemberFirstName, 
      fam.relationship FamilyRelationship,
      case when fam.relationship = 'Spouse'
           then fam.[benefit_amount]
           else 0 end SpouseBenefit,
      case when fam.relationship = 'Spouse'
           then 0
           else fam.[benefit_amount] end ChildBenefit
   FROM 
      allenrollments emp
         JOIN allenrollments fam
            on emp.[employee ssn] = fam.[employee ssn]
           AND NOT emp.relationship = 'Employee'
   where
      emp.relationship = 'Employee'

So the first table is based entirely for the employee who is covered. Then joining back to the same table by the SSN to find all OTHER family members. I dont think you even need to go to the level of SpouseBenefit vs ChildBenefit columns. It could just be the benefit amount AS FamilyMemberBenefitAmount.

CodePudding user response:

I'm assuming there's some underlying data issue and that is why we're self joining, in which case:

WITH dependants as (
SELECT 
 [employee ssn],
 [benefit amount],
 RANK() 
  OVER (
   PARTITION BY [employee ssn] 
   ORDER BY [benefit amount] DESC
  ) as unique_key
FROM allenrollments
WHERE 
 [relationship] IN('Child', 'Spouse')
)

SELECT 
 emp.[employee ssn],
 emp.[first name],
 emp.[last name],
 emp.[benefit amount] as EmployeeBenefit,
 dep.[benefit amount] as DependantBenefit
FROM allenrollments emp
JOIN 
 dependants dep on emp.[employee ssn] = dep.[employee ssn]
WHERE 
 dep.unique_key = 1 AND 
 emp.relationship = 'Employee'

Here we are using a CTE to create a query that just lists all dependent benefits and their associated employee ssns. We are using a RANK() function to give us something we can filter on (since we would get multiple records per employee ssn if there are multiple dependents). We then filter for values 'Employee' and the 1st rank in the outer query.

  • Related