I have the following tables in a Postgres database:
Table folders
| id | name |
|-----|-----------|
| 1 | folder A |
| 2 | folder B |
Table files
-- Represents the files in folders (large table)
| id | folder_id |
|-----|-----------|
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
Table metadata_one
-- Represents some info relating to files (large table)
| id | file_id | label |
|-----|---------|-------|
| 1 | 1 | abc |
| 2 | 1 | def |
| 3 | 2 | abc |
Table metadata_two
-- Represents some other info relating to files (large table)
| id | file_id | label |
|-----|---------|-------|
| 1 | 1 | abc |
| 2 | 1 | def |
| 3 | 2 | abc |
How can I get a list of distinct label values on the folder level?
Desired result
Only distinct label values, across both metadata
tables
| name | labels |
|--------------|-----------|
| folder A | abc,def |
| folder B | abc |
Attempt
Currently this is what I do:
SELECT
folders.name,
string_agg(m1.label, ',') AS m1_labels,
string_agg(m2.label, ',') AS m2_labels
FROM
folders
JOIN files ON
files.folder_id = folders.id
JOIN metadata_one m1 ON
m1.file_id = files.id
JOIN metadata_two m2 ON
m2.file_id = files.id
GROUP BY
folder.name
But this gives me the following:
| name | m1_labels | m2_labels |
|--------------|-----------|-----------|
| folder A | abc,def | abc,def |
| folder B | abc | abc |
I am looking for an optimised solution, since the files
and metadata
tables can be very large.
CodePudding user response:
You can generate a UNION of metadata_one
and metadata_two
in a temporary table and then do your string aggregation like this
WITH metadata_by_folder AS (
SELECT
folders.name,
m1.label AS label
FROM
folders
JOIN files ON
files.folder_id = folders.id
JOIN metadata_one m1 ON
m1.file_id = files.id
UNION SELECT
folders.name,
m2.label AS label
FROM
folders
JOIN files ON
files.folder_id = folders.id
JOIN metadata_two m2 ON
m2.file_id = files.id
)
SELECT
metadata_by_folder.name,
string_agg(metadata_by_folder.label, ',') AS labels
FROM
metadata_by_folder
GROUP BY
metadata_by_folder.name;