I'd like to retrieve all tables and the associated column values where two of their specific columns (the column names will be passed into) that don't have the exact same content in them.
Here's a more definite break-down of the problem. Suppose, the columns that I need to look into is 'Column_1' and 'Column_2'
- First identify from in INFORMATION_SCHEMA which of the tables have both of these columns present in them(possible one sub-query),
- And then identify which of these tables don't have exact same content on these 2 columns meaning Column_1 != Column_2.
The following section would retrieve all the tables that has both 'Column_1' and 'Column_2'.
SELECT
TABLE_NAME
FROM
INFORMATION_SCHEMA.TABLES T
WHERE
T.TABLE_CATALOG = 'myDB' AND
T.TABLE_TYPE = 'BASE TABLE'
AND EXISTS (
SELECT T.TABLE_NAME
FROM INFORMATION_SCHEMA.COLUMNS C
WHERE
C.TABLE_CATALOG = T.TABLE_CATALOG AND
C.TABLE_SCHEMA = T.TABLE_SCHEMA AND
C.TABLE_NAME = T.TABLE_NAME AND
C.COLUMN_NAME = 'Column_1')
AND EXISTS
(
SELECT T.TABLE_NAME
FROM INFORMATION_SCHEMA.COLUMNS C
WHERE
C.TABLE_CATALOG = T.TABLE_CATALOG AND
C.TABLE_SCHEMA = T.TABLE_SCHEMA AND
C.TABLE_NAME = T.TABLE_NAME AND
C.COLUMN_NAME = 'Column_2')
As the next step, I tried to use this as a sub-query and have the following at the end but that doesn't work and sql-server returns 'Cannot call methods on sysname'. What would the next step on this? This problem assumes all columns has the exact same Data-type.
WHERE SUBQUERY.TABLE_NAME.Column_1 != SUBQUERY.TABLE_NAME.Column_2
This is what's expected :
Table_Name | Column_Name1 | Column_Value_1 | Column_Name2 | Column_Value_2 |
---|---|---|---|---|
Table_A | Column_1 | abcd | Column_2 | abcde |
Table_A | Column_1 | qwerty | Column_2 | qwert |
Table_A | Column_1 | abcde | Column_2 | eabcde |
Table_B | Column_1 | zxcv | Column_2 | zxcde |
Table_C | Column_1 | asdfgh | Column_2 | asdfghy |
Table_C | Column_1 | aaaa | Column_2 | bbbb |
CodePudding user response:
I believe you need to compare the CHARACTER_MAXIMUM_LENGTH or CHARACTER_OCTET_LENGTH metadata values in the INFORMATION_SCHEMA.COLUMNS table instead of using LEN(). This can be done using something like:
SELECT T.TABLE_NAME
, C1.COLUMN_NAME, C1.DATA_TYPE, C1.CHARACTER_MAXIMUM_LENGTH
, C2.COLUMN_NAME, C2.DATA_TYPE, C2.CHARACTER_MAXIMUM_LENGTH
FROM INFORMATION_SCHEMA.TABLES T
JOIN INFORMATION_SCHEMA.COLUMNS C1
ON C1.TABLE_CATALOG = T.TABLE_CATALOG
AND C1.TABLE_SCHEMA = T.TABLE_SCHEMA
AND C1.TABLE_NAME = T.TABLE_NAME
AND C1.COLUMN_NAME = 'Column_1'
JOIN INFORMATION_SCHEMA.COLUMNS C2
ON C2.TABLE_CATALOG = T.TABLE_CATALOG
AND C2.TABLE_SCHEMA = T.TABLE_SCHEMA
AND C2.TABLE_NAME = T.TABLE_NAME
AND C2.COLUMN_NAME = 'Column_2'
WHERE T.TABLE_CATALOG = 'myDB'
AND T.TABLE_TYPE = 'BASE TABLE'
AND C1.CHARACTER_MAXIMUM_LENGTH <> C2.CHARACTER_MAXIMUM_LENGTH
The inner joins both limit results to tables having both columns and retrieve the column metadata. The length compare at the end checks for a mismatch.
This assumes character types. You might also want to check DATA_TYPE consistency ("char" vs "varchar" vs "nvarchar") or some of the other precision and scale values for other non-character data types.
CodePudding user response:
To query the data within the columns you need dynamic SQL. I would advise you not to use INFORMATION_SCHEMA
(which is for compatibility only) and instead use sys.tables
etc. You don't need to check sys.columns
twice, you can use aggregation in the EXISTS
subquery to check for multiple columns.
To compare the columns, you can do Column_1 <> Column_2
, but that will not deal with nulls correctly. If the columns can be nullable then you should instead use the syntax shown in the code below: NOT EXISTS (SELECT Column_1 INTERSECT SELECT Column_2)
DECLARE @sql nvarchar(max);
SELECT
STRING_AGG(CAST('
SELECT
Table_Name = ' QUOTENAME(t.name, '''') ',
Column_1,
Column_2
FROM ' QUOTENAME(s.name) '.' QUOTENAME(t.name) '
WHERE NOT EXISTS (SELECT Column_1 INTERSECT SELECT Column_2)
' AS nvarchar(max)), '
UNION ALL
' )
FROM sys.tables t
JOIN sys.schemas s ON s.schema_id = t.schema_id
AND s.name = 'myDB'
WHERE EXISTS (SELECT 1
FROM sys.columns c
WHERE c.object_id = t.object_id
AND c.name IN ('Column_1', 'Column_2')
HAVING COUNT(*) = 2
AND COUNT(DISTINCT c.system_type_id) = 1 -- all same type
);
PRINT @sql; -- your friend
EXEC sp_executesql @sql;
CodePudding user response:
If in fact you want to actually compare values (not length) between two columns in tables that contain those two columns, you will need to generate dynamic SQL and then execute it. This could be done semi-automatically with the following:
DECLARE @SqlTemplate VARCHAR(MAX) =
'UNION ALL'
' SELECT Table_Name = <TNAME>'
', Column_Name1 = <C1NAME>, Column_Value_1 = <C1>'
', Column_Name2 = <C2NAME>, Column_Value_2 = <C2>'
' FROM <T>'
' WHERE ISNULL(<C1>, '(null)') <> ISNULL(<C2>, '(null)')'
SELECT T.TABLE_NAME
, REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
@SqlTemplate
, '<TNAME>', QUOTENAME(T.TABLE_SCHEMA '.' T.TABLE_NAME, ''''))
, '<C1NAME>', QUOTENAME(C1.COLUMN_NAME, ''''))
, '<C2NAME>', QUOTENAME(C2.COLUMN_NAME, ''''))
, '<T>', QUOTENAME(T.TABLE_SCHEMA) '.' QUOTENAME(T.TABLE_NAME))
, '<C1>', QUOTENAME(C1.COLUMN_NAME))
, '<C2>', QUOTENAME(C2.COLUMN_NAME))
FROM INFORMATION_SCHEMA.TABLES T
JOIN INFORMATION_SCHEMA.COLUMNS C1
ON C1.TABLE_CATALOG = T.TABLE_CATALOG
AND C1.TABLE_SCHEMA = T.TABLE_SCHEMA
AND C1.TABLE_NAME = T.TABLE_NAME
AND C1.COLUMN_NAME = 'Column_1'
JOIN INFORMATION_SCHEMA.COLUMNS C2
ON C2.TABLE_CATALOG = T.TABLE_CATALOG
AND C2.TABLE_SCHEMA = T.TABLE_SCHEMA
AND C2.TABLE_NAME = T.TABLE_NAME
AND C1.COLUMN_NAME = 'Column_2'
WHERE T.TABLE_CATALOG = 'myDB'
AND T.TABLE_TYPE = 'BASE TABLE'
This would generate sql for each qualifying table of the form:
UNION ALL SELECT Table_Name = 'dbo.Z', Column_Name1 = 'X', Column_Value_1 = [X], Column_Name2 = 'Y', Column_Value_2 = [Y] FROM [dbo].[Z] WHERE ISNULL([X], '(null)') <> ISNULL([Y], '(null)')
After running the above, you would then cut & paste the generated SQL into another query window, remove the initial 'UNION ALL', and then execute the remaining SQL to get the final results.
There are ways of combining all the SQL into a single string and executing it automatically, but your problem sounds like a one-off process that doesn't warrant the extra complexity.