How do I visualize two columns/lists of trigrams to see if the same wordcombination occur in both co-CodePudding

so I have two Trigram-lists (20 Wordcombination each) e.g.

l1 = ('hello', 'its', 'me'), ('I', 'need', 'help') ...

l2 = ('I', 'need', 'help'), ('What', 'is', 'this') ...

Now I want to visualize these two list in one diagramm (maybe pairplot) to see if there are smiliarities (all 3 words must be the same).

Thank you in advance

CodePudding user response：

INPUT:

l1 = [('hello', 'its', 'me'), ('I', 'need', 'help') ...]
l2 = [('I', 'need', 'help'), ('What', 'is', 'this') ...]

OUTPUT:

sim = [[('hello', 'its', 'me'), 1], [('I', 'need', 'help'), 2], [('What', 'is', 'this'), 1]]

merged = l1   l2
unique = set(merged)
results = []

for tri in unique:
    results.append([tri, merged.count(tri)])

From your description, it seems that this is what you are looking for. Please let me know if any adjustments are needed.

CodePudding user response：

The answer given from Larry the Llama seem to have missed the "see if there are similarities" as the solution uses set() which will remove any duplicates.

If you desire a full iteration to find fully similar trigrams:

merged = l1   l2

results_counter = {}

# Iterate all the trigrams
for index, trigram in enumerate(merged):
    # Iterate all the trigrams which lay after in the array
    for second_index in range(index, len(merged)):
        all_same = True

        # Find all of which are the same as the comparing trigram
        for word_index, word in enumerate(trigram):
            if merged[second_index][word_index] == trigram[word_index:
                all_same = False
                break
        
        # If trigram was not found in the results_counter add the key else returning the value 
        previous_found = results_counter.setDefault(str(trigram), 0)
        # Add one
        previous_found[str(trigram)]  = 1

# Will print the keys and the 
for key in previous_found.keys():
    # Print the count for each trigram
    print(key, previous_found[key])