My query doesn't seems to parse information_type
STEP TO REPRODUCE:
- Take AdventureWork or any other test database
- Right click on your test database >
Tasks > Data Discovery and Classification > Classify Data...
and classify a bunch of data - Now use this query to check your
information_type
column (here I'm using AdventureWork2019):
Paste this select and execute
SELECT
schema_name(O.schema_id) AS schema_name,
O.NAME AS table_name,
C.NAME AS column_name,
[Type] =
CASE
WHEN ct.[name] IN ('varchar', 'char') THEN ct.[name] '(' IIF(c.max_length = -1, 'max', CAST(c.max_length AS VARCHAR(25))) ')'
WHEN ct.[name] IN ('nvarchar','nchar') THEN ct.[name] '(' IIF(c.max_length = -1, 'max', CAST(c.max_length / 2 AS VARCHAR(25))) ')'
WHEN ct.[name] IN ('decimal', 'numeric') THEN ct.[name] '(' CAST(c.precision AS VARCHAR(25)) ', ' CAST(c.scale AS VARCHAR(25)) ')'
WHEN ct.[name] IN ('datetime2') THEN ct.[name] '(' CAST(c.scale AS VARCHAR(25)) ')'
ELSE ct.[name]
END,
information_type,
label,
rank,
rank_desc,
CAST(
CASE
WHEN information_type = 'Contact Info'
THEN 'contact info'
ELSE 'not contact info'
END AS varchar(max)) as Checking_Content
FROM sys.sensitivity_classifications sc
JOIN sys.objects O
ON sc.major_id = O.object_id
JOIN sys.columns C
ON sc.major_id = C.object_id AND sc.minor_id = C.column_id
JOIN sys.types ct ON C.user_type_id = ct.user_type_id
--AND EP.minor_id = C.column_id
order by information_type
As you can see the query is returning not contact info
even when the column information_type
has value Contact Info
schema_name | table_name | column_name | Type | information_type | label | rank | rank_desc | Checking_Content |
---|---|---|---|---|---|---|---|---|
Person | PersonPhone | PhoneNumber | Phone | Contact Info | Confidential | 20 | MEDIUM | not contact info |
Person | PersonPhone | PhoneNumberTypeID | int | Contact Info | Confidential | 20 | MEDIUM | not contact info |
Person | PhoneNumberType | PhoneNumberTypeID | int | Contact Info | Confidential | 20 | MEDIUM | not contact info |
Person | Address | AddressLine1 | nvarchar(60) | Contact Info | Confidential | 20 | MEDIUM | not contact info |
Person | Address | AddressLine2 | nvarchar(60) | Contact Info | Confidential | 20 | MEDIUM | not contact info |
Person | Address | City | nvarchar(30) | Contact Info | Confidential | 20 | MEDIUM | not contact info |
Person | Address | PostalCode | nvarchar(15) | Contact Info | Confidential | 20 | MEDIUM | not contact info |
Production | ProductReview | EmailAddress | nvarchar(50) | Contact Info | Confidential | 20 | MEDIUM | not contact info |
Person | EmailAddress | EmailAddress | nvarchar(50) | Contact Info | Confidential | 20 | MEDIUM | not contact info |
dbo | ErrorLog | UserName | sysname | Credentials | Confidential | 20 | MEDIUM | not contact info |
Person | Password | PasswordHash | varchar(128) | Credentials | Confidential | 20 | MEDIUM | not contact info |
Person | Password | PasswordSalt | varchar(10) | Credentials | Confidential | 20 | MEDIUM | not contact info |
Sales | CreditCard | CreditCardID | int | Credit Card | Confidential | 20 | MEDIUM | not contact info |
Sales | CreditCard | CardType | nvarchar(50) | Credit Card | Confidential | 20 | MEDIUM | not contact info |
Sales | CreditCard | CardNumber | nvarchar(25) | Credit Card | Confidential | 20 | MEDIUM | not contact info |
Sales | CreditCard | ExpYear | smallint | Credit Card | Confidential | 20 | MEDIUM | not contact info |
Sales | PersonCreditCard | CreditCardID | int | Credit Card | Confidential | 20 | MEDIUM | not contact info |
Sales | SalesOrderHeader | CreditCardID | int | Credit Card | Confidential | 20 | MEDIUM | not contact info |
Sales | SalesOrderHeader | CreditCardApprovalCode | varchar(15) | Credit Card | Confidential | 20 | MEDIUM | not contact info |
HumanResources | Employee | BirthDate | date | Date Of Birth | Confidential - GDPR | 20 | MEDIUM | not contact info |
I'm very confused:
What do you think?
CodePudding user response:
According to
exec sp_describe_first_result_set N'select * from sys.sensitivity_classifications sc'
information_type is a sql_variant. Which is not per the docs but should not be an issue as compatible types are converted to sql_variant for comparison without issue. EG
select case when cast(N'Contact Info' as sql_variant) = 'Contact Info' then 1 else 0 end
returns
1
So this is potentially a bug. You can work around by casting explitly to nvarchar. eg
SELECT
schema_name(O.schema_id) AS schema_name,
O.NAME AS table_name,
C.NAME AS column_name,
[Type] =
CASE
WHEN ct.[name] IN ('varchar', 'char') THEN ct.[name] '(' IIF(c.max_length = -1, 'max', CAST(c.max_length AS VARCHAR(25))) ')'
WHEN ct.[name] IN ('nvarchar','nchar') THEN ct.[name] '(' IIF(c.max_length = -1, 'max', CAST(c.max_length / 2 AS VARCHAR(25))) ')'
WHEN ct.[name] IN ('decimal', 'numeric') THEN ct.[name] '(' CAST(c.precision AS VARCHAR(25)) ', ' CAST(c.scale AS VARCHAR(25)) ')'
WHEN ct.[name] IN ('datetime2') THEN ct.[name] '(' CAST(c.scale AS VARCHAR(25)) ')'
ELSE ct.[name]
END,
information_type,
label,
rank,
rank_desc,
CAST(
CASE
WHEN cast(information_type as nvarchar(200)) = 'Contact Info'
THEN 'contact info'
ELSE 'not contact info'
END AS varchar(max)) as Checking_Content
FROM sys.sensitivity_classifications sc
JOIN sys.objects O
ON sc.major_id = O.object_id
JOIN sys.columns C
ON sc.major_id = C.object_id AND sc.minor_id = C.column_id
JOIN sys.types ct ON C.user_type_id = ct.user_type_id
--AND EP.minor_id = C.column_id
order by information_type