How to efficiently find duplicate database entries (HSQL)-CodePudding

I have a large table of names. Each entry has a unique ID, a FORENAME, and a SURNAME. If different IDs have the same forename and surname that is not necessarily an error, but it often is, so I want a query that lists the suspects. Because the table is large I don't want to do this as a join, or rather: I don't want the query to be quadratic in table size. If I were coding this in a functional programming language, and my table was just a list of triples, I would do this as follows (in loglinear time):

sort by surname
group neighbouring elements with same surname
throw out the singleton groups [which can be done without using the length function]
within those groups (map), sort by forename, then group, then throw out singletons again
flatten those groups (twice) to get back to list of triples

How would I do this as an SQL query? The above may not be a good algo to run on DBs, but then what would be? I am not too fussed about the precise format of the output, as that output is just for human consumption. I should add that my database is in OpenOffice, so it is based on HSQL.

CodePudding user response：

You use GROUP BY and HAVING in SQL to do what you want. For example:

SELECT FIRSTNAME, LASTNAME FROM CUSTOMER GROUP BY LASTNAME, FIRSTNAME HAVING COUNT(*) > 1 ORDER BY LASTNAME

Note the ORDER BY clause is optional.