so I have two Trigram-lists (20 Wordcombination each) e.g.
l1 = ('hello', 'its', 'me'), ('I', 'need', 'help') ...
l2 = ('I', 'need', 'help'), ('What', 'is', 'this') ...
Now I want to visualize these two list in one diagramm (maybe pairplot) to see if there are smiliarities (all 3 words must be the same).
Thank you in advance
CodePudding user response:
INPUT:
l1 = [('hello', 'its', 'me'), ('I', 'need', 'help') ...]
l2 = [('I', 'need', 'help'), ('What', 'is', 'this') ...]
OUTPUT:
sim = [[('hello', 'its', 'me'), 1], [('I', 'need', 'help'), 2], [('What', 'is', 'this'), 1]]
merged = l1 l2
unique = set(merged)
results = []
for tri in unique:
results.append([tri, merged.count(tri)])
From your description, it seems that this is what you are looking for. Please let me know if any adjustments are needed.
CodePudding user response:
The answer given from Larry the Llama seem to have missed the "see if there are similarities" as the solution uses set() which will remove any duplicates.
If you desire a full iteration to find fully similar trigrams:
merged = l1 l2
results_counter = {}
# Iterate all the trigrams
for index, trigram in enumerate(merged):
# Iterate all the trigrams which lay after in the array
for second_index in range(index, len(merged)):
all_same = True
# Find all of which are the same as the comparing trigram
for word_index, word in enumerate(trigram):
if merged[second_index][word_index] == trigram[word_index:
all_same = False
break
# If trigram was not found in the results_counter add the key else returning the value
previous_found = results_counter.setDefault(str(trigram), 0)
# Add one
previous_found[str(trigram)] = 1
# Will print the keys and the
for key in previous_found.keys():
# Print the count for each trigram
print(key, previous_found[key])