From these data:
SELECT DISTINCT bank_trans.NAME, bank_trans.ID, bank_trans.BVN
FROM bank_trans
WHERE bank_trans.BVN NOT IN (SELECT DISTINCT taxpayers.bvn
FROM taxpayers)
The query above should display just 3(unique) users but it displays all 14 records, thanks.
SELECT COUNT(DISTINCT bank_trans.BVN) AS "full_count"
FROM bank_trans
WHERE bank_trans.BVN NOT IN (SELECT DISTINCT taxpayers.bvn
FROM taxpayers)
the above code works for count displays 3
CodePudding user response:
This is a job for GROUP BY
, I believe.
SELECT bank_trans.BVN, bank_trans.NAME,
COUNT(*) number_of_transactions_from_unknown_taxpayers
FROM bank_trans
WHERE bank_trans.BVN NOT IN (SELECT DISTINCT taxpayers.bvn
FROM taxpayers)
GROUP BY bank_trans.BVN, bank_trans.NAME
By the way, SELECT DISTINCT is the same as
SELECT col1, col2, col3
FROM tbl
GROUP BY col1, col2, col3
It works on all the columns mentioned in the SELECT DISTINCT, not just the first one.
CodePudding user response:
The BVN is probably the Nigerian Bank Verification Number. Every customer is identified by this unique ID.
You select from a transaction table. It s natural that we can find a BVN in there multiple times. The name would usually always be the same for one BVN, unless it changes by marrigae, say from Mary Smith to Mary Miller.
As you want each customer only once in your results, the transaction ID makes no sense. Don't select it. Your query would thus become:
SELECT DISTINCT bank_trans.name, bank_trans.bvn
FROM bank_trans
WHERE bank_trans.bvn NOT IN (SELECT taxpayers.bvn FROM taxpayers)
ORDER BY bank_trans.bvn;
This would get rid of almost all duplicates for a BVN. Only Mary would appear twice, once as Mary Smith, once as Mary Miller. And this may even be desired. If it is not, then decide for one. If it doesn't matter whether to pick the older name or the newer one, then:
SELECT ANY_VALUE(bank_trans.name) AS customer_name, bank_trans.bvn
FROM bank_trans
WHERE bank_trans.bvn NOT IN (SELECT taxpayers.bvn FROM taxpayers)
GROUP BY bank_trans.bvn
ORDER BY bank_trans.bvn;
If you want to pick the last used name, then you need a subquery for the ordering. For instance with ROW_NUMBER
:
SELECT name, bvn
FROM
(
SELECT
name, bvn,
ROW_NUMBER() OVER(PARTITION BY bvn ORDER BY transaction_date DESC) AS rn
FROM bank_trans
WHERE bvn NOT IN (SELECT taxpayers.bvn FROM taxpayers)
)
WHERE rn = 1
ORDER BY bvn;
(This last query also gives you the option to select more data from the last transaction, e.g. the last transaction ID.)