I have a ecosystems
table, which is self-referencing because each ecosystem can have sub-ecosystems. Each ecosystem can also have a score (which represents how healthy the ecosystem is). The columns are (with example data):
| slug | parent_slug | full_slug | score |
| ---- | ----------- | ----------- | ----- |
| aaa | null | aaa | 1 |
| bbb | aaa | aaa/bbb | 2 |
| ccc | bbb | aaa/bbb/ccc | 4 |
| ddd | null | ddd | 8 |
| eee | ddd | ddd/eee | 16 |
| fff | null | fff | 32 |
The full_slug
column represents the full path from top-level ecosystem down. It is redundant as it can be deduced from the slug
and parent_slug
columns, but it's there.
What I'm trying to achieve is to create a query with the same number of lines, but with a column total_score
which calculate each ecosystem's score PLUS all its children ecosystems' score, recursively. Namely, the output should be:
| slug | total_score |
| ---- | ----------- |
| aaa | 7 |
| bbb | 6 |
| ccc | 4 |
| ddd | 24 |
| eee | 16 |
| fff | 32 |
I started the following query:
WITH top AS (
SELECT
SUM(e.score) as total_score,
CASE instr(e.full_slug, '/') WHEN 0 THEN
e.full_slug
ELSE
substr(e.full_slug, 0, instr(e.full_slug, '/'))
END AS top_level_eco
FROM ecosystems e
GROUP BY top_level_eco
)
SELECT
e.slug,
top.total_score
FROM ecosystems e
INNER JOIN top on top.top_level_eco = e.slug;
But unfortunately it only shows top-level ecosystems and their total score.
CodePudding user response:
I can think of several answers...
The general answer is to use recursion...
WITH
tree AS
(
SELECT
slug AS base_slug,
slug AS current_slug,
score AS score
FROM
ecosystems
UNION ALL
SELECT
t.base_slug,
e.slug,
e.score
FROM
tree t
INNER JOIN
ecosystems e
ON e.parent_slug = t.current_slug
)
SELECT
base_slug AS slug,
SUM(score) AS total_score
FROM
tree
GROUP BY
base_slug
ORDER BY
base_slug
Another option is to make use of the full_slug
in a JOIN
, though that will prohibit use of indexes and can often be significantly slower than the general solution above.
SELECT
e.slug,
SUM(m.score) AS total_score
FROM
ecosystems e
INNER JOIN
ecosystems m -- members
ON '/' || m.full_slug || '/' LIKE '%/' || e.slug || '/%'
GROUP BY
e.slug
ORDER BY
e.slug
A third way would be to unnest
/explode
the full_slug
(that is to create one row for each component of the full_slug
), and then group by the components. SQLite doesn't have that functionality natively, so that too probably gets solved with recursion.
WITH
tree AS
(
SELECT
SUBSTR(full_slug || '/', 1, INSTR(full_slug || '/', '/')-1) AS slug,
SUBSTR(full_slug || '/', INSTR(full_slug || '/', '/') 1) AS path,
score
FROM
ecosystems
UNION ALL
SELECT
SUBSTR(path, 1, INSTR(path, '/')-1) AS slug,
SUBSTR(path, INSTR(path, '/') 1) AS path,
score
FROM
tree
WHERE
tree.path <> ''
)
SELECT
slug,
SUM(score) AS total_score
FROM
tree
GROUP BY
slug
ORDER BY
slug
Demo of all three approaches: