Home > Software design >  SQLite Sum over children in self-referencing table
SQLite Sum over children in self-referencing table

Time:10-17

I have a ecosystems table, which is self-referencing because each ecosystem can have sub-ecosystems. Each ecosystem can also have a score (which represents how healthy the ecosystem is). The columns are (with example data):

| slug | parent_slug | full_slug   | score |
| ---- | ----------- | ----------- | ----- |
| aaa  | null        | aaa         | 1     |
| bbb  | aaa         | aaa/bbb     | 2     |
| ccc  | bbb         | aaa/bbb/ccc | 4     |
| ddd  | null        | ddd         | 8     |
| eee  | ddd         | ddd/eee     | 16    |
| fff  | null        | fff         | 32    |

The full_slug column represents the full path from top-level ecosystem down. It is redundant as it can be deduced from the slug and parent_slug columns, but it's there.

What I'm trying to achieve is to create a query with the same number of lines, but with a column total_score which calculate each ecosystem's score PLUS all its children ecosystems' score, recursively. Namely, the output should be:

| slug | total_score |
| ---- | ----------- |
| aaa  | 7           |
| bbb  | 6           |
| ccc  | 4           |
| ddd  | 24          |
| eee  | 16          |
| fff  | 32          |

I started the following query:

WITH top AS (
    SELECT
        SUM(e.score) as total_score,
        CASE instr(e.full_slug, '/') WHEN 0 THEN
            e.full_slug
        ELSE
            substr(e.full_slug, 0, instr(e.full_slug, '/'))
        END AS top_level_eco
    FROM ecosystems e
    GROUP BY top_level_eco
)
SELECT
    e.slug,
    top.total_score
FROM ecosystems e
INNER JOIN top on top.top_level_eco = e.slug;

But unfortunately it only shows top-level ecosystems and their total score.

CodePudding user response:

I can think of several answers...

The general answer is to use recursion...

WITH
  tree AS
(
  SELECT
   slug   AS base_slug,
   slug   AS current_slug,
   score  AS score
  FROM
   ecosystems

  UNION ALL

  SELECT
    t.base_slug,
    e.slug,
    e.score
  FROM
    tree        t
  INNER JOIN
    ecosystems  e
      ON e.parent_slug = t.current_slug
)
SELECT
  base_slug    AS slug,
  SUM(score)   AS total_score
FROM
  tree
GROUP BY
  base_slug
ORDER BY
  base_slug

Another option is to make use of the full_slug in a JOIN, though that will prohibit use of indexes and can often be significantly slower than the general solution above.

SELECT
  e.slug,
  SUM(m.score)   AS total_score
FROM
  ecosystems    e
INNER JOIN
  ecosystems    m  -- members
    ON '/' || m.full_slug || '/' LIKE '%/' || e.slug || '/%'
GROUP BY
  e.slug
ORDER BY
  e.slug

A third way would be to unnest/explode the full_slug (that is to create one row for each component of the full_slug), and then group by the components. SQLite doesn't have that functionality natively, so that too probably gets solved with recursion.

WITH
  tree AS
(
  SELECT
    SUBSTR(full_slug || '/', 1, INSTR(full_slug || '/', '/')-1)   AS slug,
    SUBSTR(full_slug || '/',    INSTR(full_slug || '/', '/') 1)   AS path,
    score
  FROM
    ecosystems

  UNION ALL

  SELECT
    SUBSTR(path, 1, INSTR(path, '/')-1)   AS slug,
    SUBSTR(path,    INSTR(path, '/') 1)   AS path,
    score
  FROM
    tree
  WHERE
    tree.path <> ''
)
SELECT
  slug,
  SUM(score)    AS total_score
FROM
  tree
GROUP BY
  slug
ORDER BY
  slug

Demo of all three approaches:

  • Related