Home > Software engineering >  Partitioning/Grouping/listing over data that have already been partitioned with ROW_NUMBER
Partitioning/Grouping/listing over data that have already been partitioned with ROW_NUMBER

Time:07-30

I have made a TSQL script that pretty much takes/filters sets of data from a DB(multiple tables) and dump it on other table on a different DB. So far so good.

This is how the script looks like:

DECLARE @FechaInicio DATETIME = '2022-01-26 00:00:00.000';
DECLARE @FechaFin DATETIME = '2022-02-25 23:59:00.000'

WHILE (@FechaInicio <= @FechaFin)
BEGIN
    PRINT '
    ;WITH CTE_CORTES AS (
        SELECT
            XX.PACIENTE_nVECES_REPETIDOS,
            XX.FECHA_DEL_ESTUDIO,
            XX.UNIDAD_TRATANTE,
            XX.ApellidoPaterno,
            XX.ApellidoMaterno,
            XX.NOMBRE_DEL_PACIENTE,
            XX.NSS,
            XX.CLAVE_CPIM,
            XX.AGREGADO_MEDICO,
            XX.TIPO_DE_ESTUDIO,
            XX.MODALIDAD,
            XX.INTERPRETACION,
            XX.PARTICION_UID
        FROM
        (
            SELECT
                *,
                ROW_NUMBER() OVER (PARTITION BY WW.UIDESTUDIO ORDER BY WW.FECHA_DEL_ESTUDIO) AS PARTICION_UID
            FROM
                (
                    SELECT
                        ROW_NUMBER() OVER(PARTITION BY (SUBSTRING(R.NSS,1,10)) ORDER BY R.FECHA_DEL_ESTUDIO) AS PACIENTE_nVECES_REPETIDOS,
                        R.UIDESTUDIO,
                        R.FECHA_DEL_ESTUDIO,
                        R.UNIDAD_TRATANTE,
                        R.ApellidoPaterno,
                        R.ApellidoMaterno,
                        R.NOMBRE_DEL_PACIENTE,
                        R.NSS,
                        R.CLAVE_CPIM,
                        R.AGREGADO_MEDICO,
                        R.TIPO_DE_ESTUDIO,
                        R.MODALIDAD,
                        CASE
                            WHEN R.RESULTADO IS NULL
                                THEN ''SIN INTERPRETACION''
                            ELSE CONVERT(VARCHAR, R.RESULTADO, 103)
                        END AS INTERPRETACION
                    FROM
                        (
                            SELECT
                                COUNT(T.UIDESTUDIO) AS VECES_REPETIDO,
                                *
                            FROM
                            (
                                SELECT
                                    --ROW_NUMBER() OVER(ORDER BY FECHAESTUDIO) AS CONSECUTIVO,
                                    E.IdPaciente,
                                    E.FOLIO AS UIDESTUDIO,
                                    CAST (E.FechaEstudio AS smalldatetime) AS FECHA_DEL_ESTUDIO,
                                    ''HGZ 98'' AS UNIDAD_TRATANTE,
                                    P.ApellidoPaterno,
                                    P.ApellidoMaterno,
                                    CONCAT(P.PrimerNombre,'' '' /*COLLATE Modern_Spanish_CI_AI*/, P.SegundoNombre) AS NOMBRE_DEL_PACIENTE,
                                    NSS = left(P.Folio   replicate(''0'', 10), 10),
                                    CASE    
                                        WHEN E.Modalidades=''CR''
                                            THEN ''80.15.001''      
                                        WHEN E.Modalidades=''DX''
                                            THEN ''80.15.001''
                                        WHEN E.Modalidades=''MG''
                                            THEN ''80.15.002''
                                        WHEN E.Modalidades=''BDUS'' OR E.Modalidades=''BMD''
                                            THEN ''80.15.003''
                                        WHEN E.Modalidades=''RF''
                                            THEN ''80.15.004''
                                        WHEN E.Modalidades= ''US''
                                            THEN    (
                                                        CASE 
                                                            WHEN E.Descripcion NOT LIKE ''%DOPPLER%'' /*OR E.Descripcion NOT LIKE ''%DOPLER%''*/ THEN ''80.15.005''
                                                            WHEN E.Descripcion  LIKE ''%DOPPLER%'' OR E.Descripcion LIKE ''%DOPLER%'' THEN ''80.15.006''
                                                        ELSE ''80.15.005''
                                            END     )
                                        WHEN E.Modalidades= ''CT''
                                            THEN    (
                                                        CASE    
                                                            WHEN E.Descripcion NOT LIKE ''%CONTRAST%'' THEN ''80.15.007''
                                                            WHEN E.Descripcion LIKE ''%CONTRASTADO%'' /*OR E.Descripcion LIKE ''%GADOLIN%''*/ THEN ''80.15.008''
                                                        ELSE ''80.15.007''
                                            END     )
                                        WHEN E.Modalidades=''MR''
                                            THEN    (
                                                        CASE    
                                                            WHEN E.Descripcion NOT LIKE ''%CONTRASTADO%'' OR E.Descripcion NOT LIKE ''%GADOLIN%'' THEN ''80.15.009''
                                                            WHEN E.Descripcion  LIKE ''%CONTRAST%'' OR E.Descripcion LIKE ''%GADOLIN%'' THEN ''80.15.010''
                                                        ELSE ''80.15.009''
                                            END     )
                                        WHEN E.Modalidades=''XA''
                                            THEN ''80.15.011''
                                        WHEN E.Modalidades = ''ES''
                                            THEN ''80.15.012''
                                        ELSE ''80.15.014''
                                    END AS CLAVE_CPIM,
                                    AGREGADO_MEDICO = LEFT(SUBSTRING(P.Folio,11,18)   REPLICATE(''0'',8),8),
                                    CASE    
                                        WHEN E.Modalidades=''CR''
                                            THEN ''Radiología Simple''                  
                                        WHEN E.Modalidades=''DX''
                                            THEN ''Radiología Simple''
                                        WHEN E.Modalidades=''MG''
                                            THEN ''Mastografía''
                                        WHEN E.Modalidades=''BDUS'' OR E.Modalidades=''BMD''
                                            THEN ''Densitometría''
                                        WHEN E.Modalidades=''RF''
                                            THEN ''Radiología Contrastada''
                                        WHEN E.Modalidades=''US''
                                            THEN    (
                                                        CASE
                                                            WHEN E.Descripcion NOT LIKE ''%DOPPLER%'' /*OR E.Descripcion NOT LIKE ''%DOPLER%''*/ THEN ''Ultrasonido''
                                                            WHEN E.Descripcion LIKE ''%DOPPLER%'' OR E.Descripcion LIKE ''%DOPLER%'' THEN ''Ultrasonido Doppler''
                                                        ELSE ''Ultrasonido''
                                            END     )
                                        WHEN E.Modalidades=''CT''
                                            THEN    (
                                                        CASE
                                                            WHEN E.Descripcion NOT LIKE ''%CONTRAST%'' THEN ''Tomografía Computada Simple''
                                                            WHEN E.Descripcion  LIKE ''%CON CONTRASTE%'' OR E.Descripcion LIKE ''%CONTRASTADO%'' OR E.Descripcion LIKE ''%CONTRASTADA%'' OR E.Descripcion LIKE ''%GADOLIN%'' THEN ''Tomografía Computada con medio de Contraste''
                                                        ELSE ''Tomografía Computada Simple''                                
                                            END     )
                                        WHEN E.Modalidades=''MR''
                                            THEN    (
                                                        CASE
                                                            WHEN E.Descripcion NOT LIKE ''%CONTRAST%'' THEN ''Resonancia Magnética Simple''
                                                            WHEN E.Descripcion  LIKE ''%CON CONTRAST%'' OR E.Descripcion LIKE ''%CONTRASTADO%'' OR E.Descripcion LIKE ''%CONTRASTADA%'' OR E.Descripcion LIKE ''%GADOLIN%'' THEN ''Resonancia Magnética Contrastada''
                                                        ELSE ''Resonancia Magnética Simple''
                                            END     )
                                        WHEN E.Modalidades=''XA''
                                            THEN ''RADIOLOGIA INTERVENCIONISTA VASCULAR''
                                        WHEN E.Modalidades=''ES''
                                            THEN ''RADIOLOGIA INTERVENCIONISTA NO VASCULAR''
                                        ELSE ''OTRAS MODALIDADES DICOM''
                                    END AS TIPO_DE_ESTUDIO,
                                    E.Modalidades AS MODALIDAD,
                                    CASE
                                        WHEN I_R_E.IdEstadoResultado IN (0,1,2)
                                            THEN ''SIN INTERPRETACION''
                                        ELSE CONVERT(VARCHAR,I_R_E.FechaResultado,103)
                                    END AS RESULTADO
                                FROM
                                    DBO.Pacientes AS P
                                INNER JOIN
                                    DBO.ImagenologiaEstudios AS E ON P.IDPACIENTE=E.IdPaciente
                                LEFT JOIN
                                    dbo.ImagenologiaResultadosEstudio AS I_R_E ON E.IdEstudio=I_R_E.IdEstudio AND E.IdEstudio=I_R_E.IdEstudio
                                WHERE
                                    E.FechaEstudio BETWEEN '   ''''   CONVERT(VARCHAR,@FechaInicio,121)   ''''    ' AND '   ''''   CONCAT(CONVERT(VARCHAR,@FechaInicio,23),' 23:59:59.999')   ''''    ' AND E.VisiblePACS = 1
                        ) AS T
                        GROUP BY
                            T.IdPaciente,
                            T.UIDESTUDIO,
                            T.FECHA_DEL_ESTUDIO,
                            T.UNIDAD_TRATANTE,
                            T.ApellidoPaterno,
                            T.ApellidoMaterno,
                            T.NOMBRE_DEL_PACIENTE,
                            T.NSS,
                            T.CLAVE_CPIM,
                            T.AGREGADO_MEDICO,
                            T.TIPO_DE_ESTUDIO,
                            T.MODALIDAD,
                            T.RESULTADO
                        HAVING
                            COUNT(T.UIDESTUDIO) >= 1
                    ) AS R
            ) AS WW 
        ) AS XX
                    )
        INSERT INTO HIS_CORTES_UDDCM.dbo.CORTE_MES_EJEMPLO (
            [PACIENTE_nVECES_REPETIDOS],[FECHA_DEL_ESTUDIO],[UNIDAD_TRATANTE],[ApellidoPaterno],[ApellidoMaterno],[NOMBRE_DEL_PACIENTE],[NSS],[CLAVE_CPIM],[AGREGADO_MEDICO],[TIPO_DE_ESTUDIO],[MODALIDAD],[INTERPRETACION],[PARTICION_UID])
        SELECT
            TOP 1000 [PACIENTE_nVECES_REPETIDOS],[FECHA_DEL_ESTUDIO],[UNIDAD_TRATANTE],[ApellidoPaterno],[ApellidoMaterno],[NOMBRE_DEL_PACIENTE],[NSS],[CLAVE_CPIM],[AGREGADO_MEDICO],[TIPO_DE_ESTUDIO],[MODALIDAD],[INTERPRETACION],[PARTICION_UID]
        FROM
            CTE_CORTES
        WHERE
            CTE_CORTES.PARTICION_UID=1
        ORDER BY
            CTE_CORTES.NSS, CTE_CORTES.FECHA_DEL_ESTUDIO'
    SET @FechaInicio  = 1
    PRINT '/*#######################################################################################SALTO DE DIA#######################################################################################*/'
END

The whole script relies on a line of code that pretty much partitions sets of data(I'm using ROW_NUMBER and nesting the SUBSTRING function) and the ordering of the IDs and date is taken cared of by the ORDER BY clause(which is as importat as the ROW_NUMBER window function).

This is the line I'm refering to:

enter image description here

What this does is basically takes the ID column(string) and goes up to 10 characters. If there are similarities between the first 10 characters among the ID column rows, then the ROW_NUMBER function groups/enumerates each row from 1 up to number of coincidences found within a single day(It cannot go beyond a day. Because if it does, data are not longer reliable. That's the reason why the script above inserts data day by day by using the WHILE loop).

This is how the table looks like after executing the TSQL script:

enter image description here

If you pay close attention to the first column and the rows highlighted in blue, you'll see the partitioning/grouping by sets I was talking about being performed. Rows and IDs have been sorted/ordered by date and ID(ascending) in an organized manner. Similarities between the IDs are being displayed in the nTIMES_ID_REPEATED column. The main focus is on the rows highlighted in blue. Those are the ones that meet the criteria I'm about to explain. The ones highlighted in yellow also meet the criteria(similarities) but cannot be taken into account since their respective REPORT_STATUS column(Date type) are empty.

Now, this is where it gets challenging(and honestly, I have no idea how to tackle this challenge). Based on the partitioning/grouping that has been done with the nTIMES_ID_REPEATED column, I've been tasked to access those rows(the ones that display 1, 2, 3, up to the last number) and perform(maybe) another window function to list them again(display them on a different column) as long as their respective REPORT_STATUS column is not empty.

This is how the desired result should look like:

enter image description here

Eventhough the ROW_COUNT together with theSUBSTRING function have found more matches(nTIMES_ID_REPEATED column), the rows in yellow cannot be taken into account since its REPORT_STATUS column are empty. Basically they should be ignored.

By the way, I made a dummy table and populated it with the very same challenge for those who want to give it a try at enter image description here

CodePudding user response:

I believe these would both work. You might find that one performs better than the other:

case when report_status <> '' then
    row_number() over (
        partition by <ID>
        -- push the nulls out to the end so they don't interfere with numbering
        order by case when report_status <> '' then 0 else 1 end, <date>) end

Or

case when report_status <> '' then
    row_number() over (
        -- move the null rows into a distinct partition
        partition by <ID>, case when report_status <> '' then 1 end
        order by <date>) end

It seems that your column may be a varchar(255) rather than a date. Replace the null checks with whatever test is appropriate for your logic.

  • Related