How do I define a column without including it in the output-CodePudding

The only columns I want are
COUNT(DISTINCT PATIENT_ID)
YEAR
IN or Out

How do I define "Name" which is columns FIRST_NAME LAST_NAME

I have a bunch of names that I need to compare to a data set on snowflake and separate them as IN if that match or OUT if they dont

SELECT 
COUNT(DISTINCT PATIENT_ID),
FIRST_NAME,
LAST_NAME,
CONCAT(FIRST_NAME,' ', LAST_NAME) as Name,
(CASE WHEN (
    (Name LIKE '%First Name1%')
 OR (Name LIKE '%First Name2%')
 OR (Name LIKE '%First Name3%')
 OR (Name LIKE '%First Name4%')
) 
    
    THEN 'IN'
       ELSE 'OUT' END) as "IN or Out",
YEAR(SERVICE_DATE) Year
        
from "dataset" dx


WHERE 
 (dx.primary_specialty_desc like ('%GASTRO%')
 AND    dx.diagnosis_code in ('K7581' , 'K740' , 'K760' , 'K746') 
 AND    dx.SERVICE_DATE >= '2017-01-01') 
 or    ( dx.primary_specialty_desc like('%HEP%')
 AND    dx.diagnosis_code in ('K7581' , 'K740' , 'K760' , 'K746') 
 AND    dx.SERVICE_DATE >= '2017-01-01')
 
 GROUP BY
  Year,
  FIRST_NAME,
  LAST_NAME,
 SERVICE_DATE,
 (CASE WHEN (
    (Name LIKE '%First Name1%')
 OR (Name LIKE '%First Name2%')
 OR (Name LIKE '%First Name3%')
 OR (Name LIKE '%First Name4%')
 
) 
    
    THEN 'IN'
       ELSE 'OUT' END)

CodePudding user response：

It will be better to use subquery in this case like

SELECT COUNT(DISTINCT PATIENT_ID),
FIRST_NAME,
LAST_NAME,
(CASE WHEN (
    (Name LIKE '%First Name1%')
 OR (Name LIKE '%First Name2%')
 OR (Name LIKE '%First Name3%')
 OR (Name LIKE '%First Name4%')
    THEN 'IN'
       ELSE 'OUT' END as IN_OR_OUT)
 ), Year  FROM (

SELECT 
PATIENT_ID,
FIRST_NAME,
LAST_NAME,
CONCAT(FIRST_NAME,' ', LAST_NAME) as Name,
YEAR(SERVICE_DATE) Year
        
from "dataset" dx


WHERE 
 (dx.primary_specialty_desc like ('%GASTRO%')
 AND    dx.diagnosis_code in ('K7581' , 'K740' , 'K760' , 'K746') 
 AND    dx.SERVICE_DATE >= '2017-01-01') 
 or    ( dx.primary_specialty_desc like('%HEP%')
 AND    dx.diagnosis_code in ('K7581' , 'K740' , 'K760' , 'K746') 
 AND    dx.SERVICE_DATE >= '2017-01-01')
 ) AS t
 
  GROUP BY
FIRST_NAME,
LAST_NAME,
Year,
IN_OR_OUT

Just check the typo. You can check the performance with EXPLAIN plan and decide

CodePudding user response：

You have a kind of transaction table where the patient_id is stored along with the first and last names. This looks redundant on first gance, but we know that names can change, when people get married for instance, so this information may be needed.

Now, there may be patient #1, who is recorded as Ann Smith four times and later as Ann Miller five times. One name may be "IN", one may be "OUT". Or both are "IN" or both are "OUT". You just want to consider up to two entries (#1/'IN' and #1/'OUT') per year. So getting these distinct rows should be the first step in my opinion.

You can then use this data set to do your aggregation and count.

select service_year, in_or_out, count(*)
from
(
  select distinct
    year(service_date) as service_year,
    patient_id,
    case when concat(first_name,' ', last_name) like '%First Name1%'
           or concat(first_name,' ', last_name) like '%First Name2%'
           or concat(first_name,' ', last_name) like '%First Name3%'
           or concat(first_name,' ', last_name) like '%First Name4%'
         then 'IN' else 'OUT'
    end as in_or_out
  from "dataset"
  where diagnosis_code in ('K7581' , 'K740' , 'K760' , 'K746')
  and service_date >= date '2017-01-01' 
  and 
  (
    primary_specialty_desc like '%GASTRO%'
    or
    primary_specialty_desc like '%HEP%'
  )
) distinct_data_set
group by service_year, in_or_out
order by service_year, in_or_out;