Home > Back-end >  How to query across multiple rows in postgres
How to query across multiple rows in postgres

Time:01-23

I'm saving dynamic objects (objects of which I do not know the type upfront) using the following 2 tables in Postgres:

CREATE TABLE IF NOT EXISTS objects(
    id UUID NOT NULL DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,

    name TEXT NOT NULL,

    PRIMARY KEY(id)
);

CREATE TABLE IF NOT EXISTS object_values(
    id UUID NOT NULL DEFAULT gen_random_uuid(),
    event_id UUID NOT NULL,

    param TEXT NOT NULL,
    value TEXT NOT NULL,
);

So for instance, if I have the following objects:

dog = [
  { breed: "poodle", age: 15, ...},
  { breed: "husky", age: 9, ...},
}
monitors = [
  { manufacturer: "dell", ...},
}

It will live in the DB as follows:

-- objects
| id | user_id | name    |
|----|---------|---------|
| 1  | 1       | dog     |
| 2  | 2       | dog     |
| 3  | 1       | monitor |

-- object_values
| id | event_id | param        | value  |
|----|----------|--------------|--------|
| 1  | 1        | breed        | poodle |
| 2  | 1        | age          | 15     |
| 3  | 2        | breed        | husky  |
| 4  | 2        | age          | 9      |
| 5  | 3        | manufacturer | dell   |

Note, these tables are big (hundreds of millions). Generally optimised for writing. What would be a good way of querying/filtering objects based on multiple object params? For instance: Select the number of all husky dogs above the age of 10 per unique user.

I also wonder whether it would have been better to denormalise the tables and collapse the params onto a JSON column (and use gin indexes).

Are there any standards I can use here?

CodePudding user response:

"Select the number of all husky dogs above the age of 10 per unique user" - The following query would do it.

SELECT user_id, COUNT(DISTINCT event_id) AS num_husky_dogs_older_than_10
FROM       objects       o
INNER JOIN object_values ov
        ON o.id_ = ov.event_id
       AND o.name_ = 'dog'
GROUP BY o.user_id
HAVING MAX(CASE WHEN ov.param = 'age' 
                 AND ov.value_::integer >= 10 THEN 1 END) = 1
   AND MAX(CASE WHEN ov.param = 'breed'
                 AND ov.value_ = 'husky'      THEN 1 END) = 1;

Since your queries are most likely affected by having always the same JOIN operation between these two tables on the same fields, would be good to have a indices on:

  • the fields you join on ("objects.id", "object_values.event_id")
  • the fields you filter on ("objects.name", "object_values.param", "object_values.value_")

Check the demo here.

  • Related