I have two strings where I want to find a value based on the similar word occurrence among these two strings
actual = ['I', 'am', 'a', 'student', 'from', 'computer', 'science', 'department']
predicted = ['computer', 'and', 'science', 'department']
Above are sample two string I wants to compare.
Ex : There are 3 similar words occurred in predicted
string when compared to actual
string
Expected output I want to get is the length of the actual & predicted strings and the similar occurrences(words) which is 3 in this case
length of actual = 8
length of predicted = 4
similar word count = 3
CodePudding user response:
My idea is to convert actual
and predicted
into sets and then construct the intersection of both. Note however that this does not work for multiple occurrences, as sets do not contain duplicates, so the similar word count for actual = ['computer', 'computer', 'computer']
and predicted = ['computer', 'computer', 'computer']
is 1.
actual = ['I', 'am', 'a', 'student', 'from', 'computer', 'science', 'department']
predicted = ['computer', 'and', 'science', 'department']
print("length of actual = ", len(actual))
print("length of predicted = ", len(predicted))
print("similar word count = ", len(set(actual).intersection(set(predicted))))
CodePudding user response:
linuskmr's answer is pretty good if you only want to count the same word once in the similarity calculation.
However, if you want to be able to count the same word multiple times, you can use collections.Counter
rather than sets. You can find the words that appear in both lists using &
along with the minimum of the counts for that word between those two lists. The actual word counts are stored in the values of the counter, and we want to find the total number of common occurrences, so we use .values()
and sum()
:
from collections import Counter
result = sum((Counter(actual) & Counter(predicted)).values())
print(result)
This outputs:
3