Home > database >  SQL finding mismatches within the same column
SQL finding mismatches within the same column

Time:04-29

SQL Noob here. I'm struggling with a (hopefully) simple SQL query and would appreciate if someone could help me understand if what I'm trying to do is even possible. I already completed the task outside SQL so this is just my curiosity.

Task is simple. I have a Table with millions of Codes. There are 12 digit codes always starting with '00000' and their 7 digit equivalents. (like 000001234567 and 1234567) I'd need to identify all 12 digit codes that DO NOT have equivalent 7 digit code within the same column. Keep in mind there can be other 7 digit codes without a 12 digit equivalent (like 6789012)

Any help appreciated.

This is the data of the table :

Code
000001234567
1234567
000002345678
2345678
000003456789
000004567891
000005678912
6789012
...

and the expected output is :

Code
000003456789
000004567891
000005678912
...

CodePudding user response:

You can use TRIM() function and HAVING clause along with GROUP BY such as

 SELECT CONCAT('00000',TRIM(LEADING '0' FROM code))
   FROM t
  GROUP BY CONCAT('00000',TRIM(LEADING '0' FROM code)) 
  HAVING COUNT(TRIM(LEADING '0' FROM code))=1 

Demo

CodePudding user response:

There's a couple of ways of doing this. You can use functions such as Trim to remove the leading 0's or RIGHT to take the last 7 digits and compare. As an example you could do

SELECT CODE
FROM TABLE
WHERE LENGTH(CODE) = 12
AND TRIM(LEADING '0' FROM CODE) NOT IN 
   (SELECT CODE
    FROM TABLE
    WHERE LENGTH(CODE) = 7
    )

CodePudding user response:

you can try this assuming that the longer string has 12chars with '00000' in the beginning

 select ID from TABLE t1
      where 
      select count (*) from TABLE t2 where substring(t1.ID, 6, 12) <> t2.ID) = 0

CodePudding user response:

Try to make fix length and grouping it and filter count (not compulsory to make fix length because integer 01=1 so

SELECT 
REPLICATE('0',12-LEN(RTRIM(code)))   RTRIM(code),COUNT(*) 
from TableName 
GROUP BY REPLICATE('0',12-LEN(RTRIM(code)))   RTRIM(code)
HAVING COUNT(*) <2

CodePudding user response:

You will need a SELF JOIN and do some string manipulation for the join then just filter the codes which produces NULL for their equivalent codes.

WITH codes AS
( 
  SELECT '000001234567' AS code UNION ALL 
  SELECT '1234567' AS code UNION ALL 
  SELECT '000002345678' AS code UNION ALL 
  SELECT '2345678' AS code UNION ALL 
  SELECT '000003456789' AS code UNION ALL 
  SELECT '000004567891' AS code UNION ALL 
  SELECT '000005678912' AS code  
)
 SELECT 
   c.code,
   cc.code AS equivalent_code
 FROM 
   codes c
   LEFT JOIN codes cc ON RIGHT(c.code, 7) = cc.code
 WHERE CHAR_LENGTH(c.code) > 7
 AND cc.code IS NULL

CodePudding user response:

SELECT 
   CONVERT(code , UNSIGNED) as c,
   COUNT(*)
FROM table1
GROUP BY CONVERT(code , UNSIGNED)
HAVING COUNT(*)<>2
;

see: DBFIDDLE

The maximum value of an unsigned number is 4294967295, so converting to UNSIGNED is long enough to hold your numbers, because they have maximal 7 digits.

  • Related