Home > Mobile >  select distinct statement not working doesnt display unique values instead shows all
select distinct statement not working doesnt display unique values instead shows all

Time:11-24

From these data: enter image description here

  SELECT DISTINCT bank_trans.NAME, bank_trans.ID, bank_trans.BVN 
    FROM bank_trans 
    WHERE bank_trans.BVN NOT IN (SELECT DISTINCT taxpayers.bvn 
                                 FROM taxpayers)

The query above should display just 3(unique) users but it displays all 14 records, thanks.

SELECT COUNT(DISTINCT bank_trans.BVN) AS "full_count" 
FROM bank_trans 
WHERE bank_trans.BVN NOT IN (SELECT DISTINCT taxpayers.bvn 
                             FROM taxpayers) 

the above code works for count displays 3

CodePudding user response:

This is a job for GROUP BY, I believe.

SELECT bank_trans.BVN, bank_trans.NAME,
       COUNT(*) number_of_transactions_from_unknown_taxpayers
  FROM bank_trans 
 WHERE bank_trans.BVN NOT IN (SELECT DISTINCT taxpayers.bvn 
                                 FROM taxpayers)
 GROUP BY  bank_trans.BVN, bank_trans.NAME

By the way, SELECT DISTINCT is the same as

SELECT col1, col2, col3
  FROM tbl
 GROUP BY col1, col2, col3 

It works on all the columns mentioned in the SELECT DISTINCT, not just the first one.

CodePudding user response:

The BVN is probably the Nigerian Bank Verification Number. Every customer is identified by this unique ID.

You select from a transaction table. It s natural that we can find a BVN in there multiple times. The name would usually always be the same for one BVN, unless it changes by marrigae, say from Mary Smith to Mary Miller.

As you want each customer only once in your results, the transaction ID makes no sense. Don't select it. Your query would thus become:

SELECT DISTINCT bank_trans.name, bank_trans.bvn
FROM bank_trans 
WHERE bank_trans.bvn NOT IN (SELECT taxpayers.bvn FROM taxpayers)
ORDER BY bank_trans.bvn;

This would get rid of almost all duplicates for a BVN. Only Mary would appear twice, once as Mary Smith, once as Mary Miller. And this may even be desired. If it is not, then decide for one. If it doesn't matter whether to pick the older name or the newer one, then:

SELECT ANY_VALUE(bank_trans.name) AS customer_name, bank_trans.bvn
FROM bank_trans 
WHERE bank_trans.bvn NOT IN (SELECT taxpayers.bvn FROM taxpayers)
GROUP BY bank_trans.bvn
ORDER BY bank_trans.bvn;

If you want to pick the last used name, then you need a subquery for the ordering. For instance with ROW_NUMBER:

SELECT name, bvn
FROM
(
  SELECT
    name, bvn,
    ROW_NUMBER() OVER(PARTITION BY bvn ORDER BY transaction_date DESC) AS rn
  FROM bank_trans
  WHERE bvn NOT IN (SELECT taxpayers.bvn FROM taxpayers)
)
WHERE rn = 1
ORDER BY bvn;

(This last query also gives you the option to select more data from the last transaction, e.g. the last transaction ID.)

  • Related