I have 2 Postgres tables:
Table UserItems
:
uuid | user | item |
---|---|---|
1 | 1 | item1 |
2 | 1 | item2 |
3 | 1 | item3 |
4 | 2 | item1 |
4 | 2 | item2 |
5 | 2 | item4 |
6 | 3 | item2 |
7 | 3 | item5 |
8 | 4 | item1 |
9 | 4 | item5 |
Table Items
item | feature1 | feature2 | feature3 |
---|---|---|---|
item1 | val_11 | val_12 | val_13 |
item2 | val_21 | val_22 | val_23 |
item3 | val_31 | val_32 | val_33 |
item4 | val_41 | val_42 | val_43 |
item4 | val_51 | val_52 | val_53 |
I'd like to calculate the Jaccard similarity given some input X. So for example, if the input X is item1
, I'd like to find every user which has purchased item1
, then calculate the Intersection Over Union (IOU) for every other item which any of those users have purchased. The output would then be:
item | feature1 | feature2 | feature3 | jaccard |
---|---|---|---|---|
item2 | val_21 | val_22 | val_23 | 0.67 |
item3 | val_31 | val_32 | val_33 | 0.33 |
item4 | val_41 | val_42 | val_43 | 0.33 |
item5 | val_51 | val_52 | val_53 | 0.33 |
My approach so far is
SELECT t1.item
FROM UserItems AS t1
JOIN Items AS t2 ON t1.item<>t1.item
WHERE t1.item={input}
GROUP BY t1.item
Then the idea was that I could count the total number of times that each item occurs divided by the total number of groups ... But I'm stumped on how to count the total number of times that each item occurs
CodePudding user response:
here is one way , if I didn't do any mistake :
select i.*, ii.jaccard
from Items i
join (
select item, count(distinct u.username) * 1.00 / uucount jaccard
from userItems u
join ( select tt.username, count(username) over () uucount
from userItems tt
where item = 'item1'
) t on t.username = u.username
and u.item <> 'item1'
group by u.item , t.uucount
) ii on ii.item = i.item
db<>fiddle here