SQL Calculate Jaccard Similarity (IoU)-CodePudding

I have 2 Postgres tables:

Table UserItems:

uuid	user	item
1	1	item1
2	1	item2
3	1	item3
4	2	item1
4	2	item2
5	2	item4
6	3	item2
7	3	item5
8	4	item1
9	4	item5

Table Items

item	feature1	feature2	feature3
item1	val_11	val_12	val_13
item2	val_21	val_22	val_23
item3	val_31	val_32	val_33
item4	val_41	val_42	val_43
item4	val_51	val_52	val_53

I'd like to calculate the Jaccard similarity given some input X. So for example, if the input X is item1, I'd like to find every user which has purchased item1, then calculate the Intersection Over Union (IOU) for every other item which any of those users have purchased. The output would then be:

item	feature1	feature2	feature3	jaccard
item2	val_21	val_22	val_23	0.67
item3	val_31	val_32	val_33	0.33
item4	val_41	val_42	val_43	0.33
item5	val_51	val_52	val_53	0.33

My approach so far is

SELECT t1.item
FROM UserItems AS t1
JOIN Items AS t2 ON t1.item<>t1.item
WHERE t1.item={input}
GROUP BY t1.item

Then the idea was that I could count the total number of times that each item occurs divided by the total number of groups ... But I'm stumped on how to count the total number of times that each item occurs

CodePudding user response：

here is one way , if I didn't do any mistake :

select i.*, ii.jaccard 
from Items i
join ( 
  select item, count(distinct u.username) * 1.00 / uucount jaccard
  from userItems u
  join ( select tt.username, count(username) over () uucount 
         from userItems tt 
         where item = 'item1'
  ) t on t.username = u.username 
      and  u.item <> 'item1'
  group by u.item , t.uucount 
) ii on ii.item = i.item

db<>fiddle here