I am a beginner at python. Here I had a problem with comparing two lists. My first problem is the list should not be compared exactly. But It should be compared about 70% matching with other list and return true if exist. contains() method doesn't help in this case. Here is my list:
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-D"]
CodePudding user response:
fuzzywuzzy library in Sahil Desai's answer looks really simple.
Here is an idea with basic functions.
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow"]
print(len(set(TotalTags).intersection(set(LikedTags))) / len(TotalTags)) # 0.8333333
print(sum([True for x in TotalTags if x in LikedTags]) / len(TotalTags)) # 0.8333333
CodePudding user response:
you can utilizes fuzzywuzzy python library
from fuzzywuzzy import fuzz
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-D"]
per = fuzz.ratio(TotalTags,LikedTags)
per
65
This method directly match the characters of the two list if you want to just match the items then you can used Jaccard similarity method.
CodePudding user response:
You can use difflib.SequenceMatcher
and find similarity between each two word from two list like below: (Output only shows two words that have similarity > 70%)
from difflib import SequenceMatcher
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"]
LikedTags = ["citrus", "orange", "vitamin-D"]
for a in LikedTags:
for b in TotalTags:
sim = SequenceMatcher(None, a, b).ratio()
if sim > 0.7:
print(f'similarity of {a} & {b} : {sim}')
Output:
similarity of citrus & citrus : 1.0
similarity of orange & orange : 1.0
similarity of vitamin-D & vitamin-C : 0.8888888888888888
similarity of vitamin-D & vitamin-A : 0.8888888888888888
CodePudding user response:
you can also do something like this with the builtin collections
module
import collections
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow"]
c = collections.Counter(TotalTags)
c.subtract(LinkedTags)
print(1-c.total()/len(TotalTags))
output:
0.8333333333333334