I have a large table of names. Each entry has a unique ID, a FORENAME, and a SURNAME. If different IDs have the same forename and surname that is not necessarily an error, but it often is, so I want a query that lists the suspects. Because the table is large I don't want to do this as a join, or rather: I don't want the query to be quadratic in table size. If I were coding this in a functional programming language, and my table was just a list of triples, I would do this as follows (in loglinear time):
- sort by surname
- group neighbouring elements with same surname
- throw out the singleton groups [which can be done without using the length function]
- within those groups (map), sort by forename, then group, then throw out singletons again
- flatten those groups (twice) to get back to list of triples
How would I do this as an SQL query? The above may not be a good algo to run on DBs, but then what would be? I am not too fussed about the precise format of the output, as that output is just for human consumption. I should add that my database is in OpenOffice, so it is based on HSQL.
CodePudding user response:
You use GROUP BY and HAVING in SQL to do what you want. For example:
SELECT FIRSTNAME, LASTNAME FROM CUSTOMER GROUP BY LASTNAME, FIRSTNAME HAVING COUNT(*) > 1 ORDER BY LASTNAME
Note the ORDER BY clause is optional.