Home > OS >  SQL Calculate Jaccard Similarity (IoU)
SQL Calculate Jaccard Similarity (IoU)

Time:11-26

I have 2 Postgres tables:

Table UserItems:

uuid user item
1 1 item1
2 1 item2
3 1 item3
4 2 item1
4 2 item2
5 2 item4
6 3 item2
7 3 item5
8 4 item1
9 4 item5

Table Items

item feature1 feature2 feature3
item1 val_11 val_12 val_13
item2 val_21 val_22 val_23
item3 val_31 val_32 val_33
item4 val_41 val_42 val_43
item4 val_51 val_52 val_53

I'd like to calculate the Jaccard similarity given some input X. So for example, if the input X is item1, I'd like to find every user which has purchased item1, then calculate the Intersection Over Union (IOU) for every other item which any of those users have purchased. The output would then be:

item feature1 feature2 feature3 jaccard
item2 val_21 val_22 val_23 0.67
item3 val_31 val_32 val_33 0.33
item4 val_41 val_42 val_43 0.33
item5 val_51 val_52 val_53 0.33

My approach so far is

SELECT t1.item
FROM UserItems AS t1
JOIN Items AS t2 ON t1.item<>t1.item
WHERE t1.item={input}
GROUP BY t1.item

Then the idea was that I could count the total number of times that each item occurs divided by the total number of groups ... But I'm stumped on how to count the total number of times that each item occurs

CodePudding user response:

here is one way , if I didn't do any mistake :

select i.*, ii.jaccard 
from Items i
join ( 
  select item, count(distinct u.username) * 1.00 / uucount jaccard
  from userItems u
  join ( select tt.username, count(username) over () uucount 
         from userItems tt 
         where item = 'item1'
  ) t on t.username = u.username 
      and  u.item <> 'item1'
  group by u.item , t.uucount 
) ii on ii.item = i.item

db<>fiddle here

  • Related