I have a table containing approx 8,000 rows.
It has 15 columns in it and one of those is a "reference number".
My goal is to find all rows that match each other, and for the matching rows, find out what the reference number is.
The reference number is a bit random and doesn't really have a relationship to the rest of the data.
This is the SQL I have so far which gives me all the matching rows and count:
select
count(*),
component,
privileges,
protocol,
authority,
score,
means,
difficulty,
hierarchy,
interaction,
scope,
conf,
integrity,
availability,
version
from
data
group by
component,
privileges,
protocol,
authority,
score,
means,
difficulty,
hierarchy,
interaction,
scope,
conf,
integrity,
availability,
version
having count(*) >1
order by count(*) desc;
The row I have missed off the query, because I cannot get this to work as intended if I include it is reference_number
I require the reference number for every row that comes through as a hit for this query. So, if the count(*) is 23 for one set of columns that matches the query, what is the reference number for all 23. That's my problem and I'm not sure if I can do that with SQL.
I can make new tables, or views, I felt like a subquery might help at first, e.g.
select reference_number from
(select
count(*),
component,
privileges,
protocol....
etc
But that fails with:
ERROR: subquery in FROM must have an alias
Any help appreciated. (I am using Postgresql if that helps).
CodePudding user response:
One option is to use the string_agg() function.
The function signature is
string_agg ( value text, delimiter text ) → text. Concatenates the non-null input values into a string. Each value after the first is preceded by the corresponding delimiter (if it's not null).
So in your original query, add a column with string_agg(reference_number, ','). This will give you a comma-separate list of reference numbers for each row.