Home > Blockchain >  SELECT only rows based on double values on 2 columns
SELECT only rows based on double values on 2 columns

Time:09-24

I have a database that was given to us without constraints, no PK or FK.

I'm trying to reverse engineer to find out connections between tables. In order to do so my approach is to create a list of all columns with the same name and data type. So I created this query:

SELECT schema_name(tab.schema_id) AS schema_name
    ,tab.name AS table_name
    ,col.name AS column_name
    ,t.name AS data_type
    ,SUM([Partitions].[rows]) AS [TotalRowCount]
    
FROM sys.tables AS tab
INNER JOIN sys.columns AS col ON tab.object_id = col.object_id
LEFT JOIN sys.types AS t ON col.user_type_id = t.user_type_id
JOIN sys.partitions AS [Partitions] ON tab.[object_id] = [Partitions].[object_id]
    AND [Partitions].index_id IN (
        0
        ,1
        )

GROUP BY schema_name(tab.schema_id)
    ,tab.name
    ,col.name
    ,t.name
    
ORDER BY col.name 

As result I have this:

schema_name table_name column_name data_type TotalRowCount
H513WEB ALPRETAR IDREVI numeric 1439
H513WEB ALPRETAR IDREVN numeric 1439
H513WEB CPDOENP IDRLPP numeric 4156
H513WEB ALPRETAR IDRNG numeric 1439
H513WEB FAAFACP IDROLDEBCOM numeric 100541
H513WEB FAAFACP_OLD IDROLDEBCOM numeric 1513
H513WEB CACPTAP IDROLE numeric 22109
H513WEB CARROLP IDROLE numeric 5
H513WEB FAENTFP IDROLE numeric 26084
H513WEB CACPTCP IDRUBRIQUE numeric 48
H513WEB CACPTRP IDRUBRIQUE numeric 12942
H513WEB CAIMDCP IDRUBRIQUE numeric 6637
H513WEB CAIMGIP IDRUBRIQUE numeric 5
H513WEB CAIMRSP IDRUBRIQUE numeric 40
H513WEB CAOREXP IDRUBRIQUE numeric 5
H513WEB CAPLARP IDRUBRIQUE numeric 3
H513WEB CARERUP IDRUBRIQUE numeric 175
H513WEB CARUBRP IDRUBRIQUE numeric 312

but I only would like to list rows that have the same column_name and data_type and the result should look like this:

schema_name table_name column_name data_type TotalRowCount
H513WEB FAAFACP IDROLDEBCOM numeric 100541
H513WEB FAAFACP_OLD IDROLDEBCOM numeric 1513
H513WEB CACPTAP IDROLE numeric 22109
H513WEB CARROLP IDROLE numeric 5
H513WEB FAENTFP IDROLE numeric 26084
H513WEB CACPTCP IDRUBRIQUE numeric 48
H513WEB CACPTRP IDRUBRIQUE numeric 12942
H513WEB CAIMDCP IDRUBRIQUE numeric 6637
H513WEB CAIMGIP IDRUBRIQUE numeric 5
H513WEB CAIMRSP IDRUBRIQUE numeric 40
H513WEB CAOREXP IDRUBRIQUE numeric 5
H513WEB CAPLARP IDRUBRIQUE numeric 3
H513WEB CARERUP IDRUBRIQUE numeric 175
H513WEB CARUBRP IDRUBRIQUE numeric 312

How to do that?

CodePudding user response:

If you're trying to identify PK and FK candidates, it's not just the data type that needs to match, but the length, scale and precision too. A varchar(10) can't be a foreign key of a varchar(8) for example. In actuality the name doesn't need to match at all, but you would end up with a terrible number of false positives without including that.

I've guessed a little at what you want but i think this might be along the right lines. The (estimated) row count, on the other hand, doesn't seem necessary at all, so I've removed that from the query:

WITH ColumnCount AS(
    SELECT s.name AS SchemaName,
           t.name AS TableName,
           c.name AS ColumnName,
           ct.[name] AS DataType,
           c.max_length,
           c.precision,
           c.scale,
           COUNT(c.column_id) OVER (PARTITION BY c.[name], ct.[name], c.max_length, c.precision, c.scale) AS Duplicates
    FROM sys.schemas s
         JOIN sys.tables t ON s.schema_id = t.schema_id
         JOIN sys.columns c ON t.object_id = c.object_id
         JOIN sys.types ct ON c.user_type_id = ct.user_type_id)
SELECT *
FROM ColumnCount CC
WHERE CC.Duplicates > 1
ORDER BY CC.ColumnName,
         CC.SchemaName,
         CC.TableName;
  • Related