I have a recurring_bill table which describes a subscription transaction. I have some duplicates that have the same customerId and chargeId. I want to identify these duplicate records. I also want to make sure IsDeleted is false for both records and the LastDay is in the future or NULL (so its an ongoing charge). My current query shows records that do not have a duplicate. Please help me correct my query.
SELECT * FROM recurring_bills b1
WHERE EXISTS
(SELECT * FROM recurring_bill b2 WHERE b1.customerId = b2.customerId
AND b1.chargeId = b2.chargeId
AND (b2.LastDay > '2022-03-10' OR b2.LastDay IS NULL)
AND b2.IsDeleted = 0)
AND (b1.LastDay > '2022-03-10' OR b1.LastDay IS NULL) AND b1.IsDeleted = 0;
Lets say this is the input
customerId | chargeId | LastDay | IsDeleted
1 | charge1 | NULL | 0
1 | charge1 | 05-23-2022 | 0
2 | charge2 | 05-23-2022 | 0
2 | charge2 | 05-23-2021 | 0
3 | charge3 | NULL | 1
3 | charge3 | NULL | 0
The correct output would be
customerId | chargeId | LastDay | IsDeleted
1 | charge1 | NULL | 0
MySQL version is 5.5.59
CodePudding user response:
Try using EXISTS to identify records with the same customerId, chargeId, etc.. but having a different unique record Id. (I didn't know the name of your column, so used "TheUniqueId" for the example).
Note, when mixing AND/OR operators you must use parentheses to ensure expressions are evaluated in the expected order. Otherwise, the query may return the wrong results.
That said, wouldn't you want to see all of the "duplicates", so you could take action on them, if needed? If so, try:
CREATE TABLE recurring_bills (`theUniqueId` int auto_increment primary key , `customerId` int , `chargeId` varchar(7) , `LastDay` date , `IsDeleted` int ) ; INSERT INTO recurring_bills (`customerId`, `chargeId`, `LastDay`, `IsDeleted`) VALUES (1, 'charge1', NULL, 0), (1, 'charge1', '2022-05-23', 0), (2, 'charge2', '2022-05-23', 0), (2, 'charge2', '2021-05-23', 0), (3, 'charge3', NULL, 1), (3, 'charge3', NULL, 0) ;
-- Show all duplicates SELECT * FROM recurring_bills dupe WHERE dupe.IsDeleted = 0 AND ( dupe.LastDay > '2022-03-10' OR dupe.LastDay IS NULL ) AND EXISTS ( SELECT NULL FROM recurring_bills b WHERE b.customerId = dupe.customerId AND b.chargeId = dupe.chargeId AND b.IsDeleted = dupe.IsDeleted AND b.theUniqueId <> dupe.theUniqueId AND ( b.LastDay > '2022-03-10' OR b.LastDay IS NULL ) )
Results:
theUniqueId | customerId | chargeId | LastDay | IsDeleted ----------: | ---------: | :------- | :--------- | --------: 1 | 1 | charge1 | null | 0 2 | 1 | charge1 | 2022-05-23 | 0
If for some reason you really want a single record per chargeId customerId, add a GROUP BY
-- Show single record per dupe combination SELECT * FROM recurring_bills dupe WHERE dupe.IsDeleted = 0 AND ( dupe.LastDay > '2022-03-10' OR dupe.LastDay IS NULL ) AND EXISTS ( SELECT NULL FROM recurring_bills b WHERE b.customerId = dupe.customerId AND b.chargeId = dupe.chargeId AND b.IsDeleted = dupe.IsDeleted AND b.theUniqueId <> dupe.theUniqueId AND ( b.LastDay > '2022-03-10' OR b.LastDay IS NULL ) ) GROUP BY dupe.customerId, dupe.chargeId
Results:
theUniqueId | customerId | chargeId | LastDay | IsDeleted ----------: | ---------: | :------- | :------ | --------: 1 | 1 | charge1 | null | 0
db<>fiddle here