I have a table in Postgres with an hstore column like:
"surface => concrete, ramp => yes, incline => up, highway => footway"
"surface => asphalt, lit => no, source => survey, incline => 12.6%, highway => footway"
"bicycle => yes, surface => asphalt, highway => footway, segregated => no"
Now I would like to analyse the keys among themselves. Therefore I want to know for the whole dataset how often the keys occur in combination pairwise.
The result should look something like this:
| Key 1 | Key 2 | Count |
|---------------------|------------------|------------------|
| surface | source | 1 |
| surface | highway | 3 |
| surface | incline | 2 |
| highway | bicycle | 1 |
.....
CodePudding user response:
Use hstore's each()
to expand it out, then join and aggregate:
with expand as (
select id, k, v
from htab
cross join lateral each(hcol) as c(k, v)
), pairings as (
select t1.id, t1.k as key1, t2.k as key2
from expand t1
join expand t2
on t2.id = t1.id
and t2.k < t1.k
)
select key1, key2, count(*)
from pairings
group by key1, key2
order by key1 desc, key2 desc;
CodePudding user response:
Use the function akeys()
to get all keys of the hstore
column as an array, generate all pairs of the arrays with generate_subscripts()
and group the result by the pairs:
select akeys[i] as key1, akeys[j] as key2, count(*)
from my_table
cross join akeys(hstore_col)
cross join generate_subscripts(akeys, 1) as i
cross join generate_subscripts(akeys, 1) as j
where akeys[i] > akeys[j]
group by 1, 2
order by 1 desc, 2 desc
Test it in db<>fidlle.