I have two tables that are mapped together with a mapping table; the tables are as follows:
- data
- metadata
- matadata_map
metadata_map
is mapping data
together with metadata
.
In the metadata_map
table, I need to map some IDs together. I need to find out which data
ids have X, Y .. N metadata
ids.
The metadata_map
table could look like the following:
--------------------
| data_id| meta_id |
--------------------
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
| 3 | 4 |
--------------------
In this example, I will try to find out which data_id
s has both meta_id
s 2 and 3.
When looking up in the metadata_map
table, I should get the result: (1, and 3)
----------
| data |
----------
| 1 |
| 3 |
----------
since both data_id
1 and 3 has meta_id
2 and 3.
My question is, how can I best match them together in PostgreSQL to get the result? This also requires that it's highly performant because the table has, at the moment of typing, about ~700mil entries.
CodePudding user response:
Alternative - using an array:
select data_id
from metadata_map
group by data_id
having array_agg(meta_id) @> array[2,3];
This query retrieves data_id
s that have both meta_id 2 and meta_id 3 but may have others too.
having array_agg(meta_id order by meta_id) = array[2,3];
would retrieve data_id
s that have exactly meta_id 2 and meta_id 3 and no others.
The pattern can be reused with other array operators too.
CodePudding user response:
You can use grouping
select data_id
from metadata_map
where meta_id in (2,3)
group by data_id
having count(*) = 2
The query assumes metadata_map
contains no dups. Alternatively use count(distinct meta_id)
.