Home > database >  How to check matching percentage of two string lists?
How to check matching percentage of two string lists?

Time:06-23

I am a beginner at python. Here I had a problem with comparing two lists. My first problem is the list should not be compared exactly. But It should be compared about 70% matching with other list and return true if exist. contains() method doesn't help in this case. Here is my list:

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-D"]

CodePudding user response:

fuzzywuzzy library in Sahil Desai's answer looks really simple.

Here is an idea with basic functions.

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow"]

print(len(set(TotalTags).intersection(set(LikedTags))) / len(TotalTags))  # 0.8333333
print(sum([True for x in TotalTags if x in LikedTags]) / len(TotalTags))  # 0.8333333

CodePudding user response:

you can utilizes fuzzywuzzy python library

from fuzzywuzzy import fuzz

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-D"]


per = fuzz.ratio(TotalTags,LikedTags)
per

 65

This method directly match the characters of the two list if you want to just match the items then you can used Jaccard similarity method.

CodePudding user response:

You can use difflib.SequenceMatcher and find similarity between each two word from two list like below: (Output only shows two words that have similarity > 70%)

from difflib import SequenceMatcher
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"]
LikedTags = ["citrus", "orange", "vitamin-D"]
for a in LikedTags:
    for b in TotalTags:
        sim = SequenceMatcher(None, a, b).ratio()
        if sim > 0.7:
            print(f'similarity of {a} & {b} : {sim}')

Output:

similarity of citrus & citrus : 1.0
similarity of orange & orange : 1.0
similarity of vitamin-D & vitamin-C : 0.8888888888888888
similarity of vitamin-D & vitamin-A : 0.8888888888888888

CodePudding user response:

you can also do something like this with the builtin collections module

import collections
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow"]
c = collections.Counter(TotalTags)
c.subtract(LinkedTags)
print(1-c.total()/len(TotalTags))

output:

0.8333333333333334
  • Related