How to check matching percentage of two string lists?-CodePudding

I am a beginner at python. Here I had a problem with comparing two lists. My first problem is the list should not be compared exactly. But It should be compared about 70% matching with other list and return true if exist. contains() method doesn't help in this case. Here is my list:

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-D"]

CodePudding user response：

fuzzywuzzy library in Sahil Desai's answer looks really simple.

Here is an idea with basic functions.

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow"]

print(len(set(TotalTags).intersection(set(LikedTags))) / len(TotalTags))  # 0.8333333
print(sum([True for x in TotalTags if x in LikedTags]) / len(TotalTags))  # 0.8333333

CodePudding user response：

you can utilizes fuzzywuzzy python library

from fuzzywuzzy import fuzz

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-D"]


per = fuzz.ratio(TotalTags,LikedTags)
per

 65

This method directly match the characters of the two list if you want to just match the items then you can used Jaccard similarity method.

CodePudding user response：

You can use difflib.SequenceMatcher and find similarity between each two word from two list like below: (Output only shows two words that have similarity > 70%)

from difflib import SequenceMatcher
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"]
LikedTags = ["citrus", "orange", "vitamin-D"]
for a in LikedTags:
    for b in TotalTags:
        sim = SequenceMatcher(None, a, b).ratio()
        if sim > 0.7:
            print(f'similarity of {a} & {b} : {sim}')

Output:

similarity of citrus & citrus : 1.0
similarity of orange & orange : 1.0
similarity of vitamin-D & vitamin-C : 0.8888888888888888
similarity of vitamin-D & vitamin-A : 0.8888888888888888

CodePudding user response：

you can also do something like this with the builtin collections module

import collections
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow"]
c = collections.Counter(TotalTags)
c.subtract(LinkedTags)
print(1-c.total()/len(TotalTags))

output:

0.8333333333333334