SELECT only rows based on double values on 2 columns-CodePudding

I have a database that was given to us without constraints, no PK or FK.

I'm trying to reverse engineer to find out connections between tables. In order to do so my approach is to create a list of all columns with the same name and data type. So I created this query:

SELECT schema_name(tab.schema_id) AS schema_name
    ,tab.name AS table_name
    ,col.name AS column_name
    ,t.name AS data_type
    ,SUM([Partitions].[rows]) AS [TotalRowCount]
    
FROM sys.tables AS tab
INNER JOIN sys.columns AS col ON tab.object_id = col.object_id
LEFT JOIN sys.types AS t ON col.user_type_id = t.user_type_id
JOIN sys.partitions AS [Partitions] ON tab.[object_id] = [Partitions].[object_id]
    AND [Partitions].index_id IN (
        0
        ,1
        )

GROUP BY schema_name(tab.schema_id)
    ,tab.name
    ,col.name
    ,t.name
    
ORDER BY col.name

As result I have this:

schema_name	table_name	column_name	data_type	TotalRowCount
H513WEB	ALPRETAR	IDREVI	numeric	1439
H513WEB	ALPRETAR	IDREVN	numeric	1439
H513WEB	CPDOENP	IDRLPP	numeric	4156
H513WEB	ALPRETAR	IDRNG	numeric	1439
H513WEB	FAAFACP	IDROLDEBCOM	numeric	100541
H513WEB	FAAFACP_OLD	IDROLDEBCOM	numeric	1513
H513WEB	CACPTAP	IDROLE	numeric	22109
H513WEB	CARROLP	IDROLE	numeric	5
H513WEB	FAENTFP	IDROLE	numeric	26084
H513WEB	CACPTCP	IDRUBRIQUE	numeric	48
H513WEB	CACPTRP	IDRUBRIQUE	numeric	12942
H513WEB	CAIMDCP	IDRUBRIQUE	numeric	6637
H513WEB	CAIMGIP	IDRUBRIQUE	numeric	5
H513WEB	CAIMRSP	IDRUBRIQUE	numeric	40
H513WEB	CAOREXP	IDRUBRIQUE	numeric	5
H513WEB	CAPLARP	IDRUBRIQUE	numeric	3
H513WEB	CARERUP	IDRUBRIQUE	numeric	175
H513WEB	CARUBRP	IDRUBRIQUE	numeric	312

but I only would like to list rows that have the same column_name and data_type and the result should look like this:

schema_name	table_name	column_name	data_type	TotalRowCount
H513WEB	FAAFACP	IDROLDEBCOM	numeric	100541
H513WEB	FAAFACP_OLD	IDROLDEBCOM	numeric	1513
H513WEB	CACPTAP	IDROLE	numeric	22109
H513WEB	CARROLP	IDROLE	numeric	5
H513WEB	FAENTFP	IDROLE	numeric	26084
H513WEB	CACPTCP	IDRUBRIQUE	numeric	48
H513WEB	CACPTRP	IDRUBRIQUE	numeric	12942
H513WEB	CAIMDCP	IDRUBRIQUE	numeric	6637
H513WEB	CAIMGIP	IDRUBRIQUE	numeric	5
H513WEB	CAIMRSP	IDRUBRIQUE	numeric	40
H513WEB	CAOREXP	IDRUBRIQUE	numeric	5
H513WEB	CAPLARP	IDRUBRIQUE	numeric	3
H513WEB	CARERUP	IDRUBRIQUE	numeric	175
H513WEB	CARUBRP	IDRUBRIQUE	numeric	312

How to do that?

CodePudding user response：

If you're trying to identify PK and FK candidates, it's not just the data type that needs to match, but the length, scale and precision too. A varchar(10) can't be a foreign key of a varchar(8) for example. In actuality the name doesn't need to match at all, but you would end up with a terrible number of false positives without including that.

I've guessed a little at what you want but i think this might be along the right lines. The (estimated) row count, on the other hand, doesn't seem necessary at all, so I've removed that from the query:

WITH ColumnCount AS(
    SELECT s.name AS SchemaName,
           t.name AS TableName,
           c.name AS ColumnName,
           ct.[name] AS DataType,
           c.max_length,
           c.precision,
           c.scale,
           COUNT(c.column_id) OVER (PARTITION BY c.[name], ct.[name], c.max_length, c.precision, c.scale) AS Duplicates
    FROM sys.schemas s
         JOIN sys.tables t ON s.schema_id = t.schema_id
         JOIN sys.columns c ON t.object_id = c.object_id
         JOIN sys.types ct ON c.user_type_id = ct.user_type_id)
SELECT *
FROM ColumnCount CC
WHERE CC.Duplicates > 1
ORDER BY CC.ColumnName,
         CC.SchemaName,
         CC.TableName;