Home > Enterprise >  How do I visualize two columns/lists of trigrams to see if the same wordcombination occur in both co
How do I visualize two columns/lists of trigrams to see if the same wordcombination occur in both co

Time:12-08

so I have two Trigram-lists (20 Wordcombination each) e.g.

l1 = ('hello', 'its', 'me'), ('I', 'need', 'help') ...

l2 = ('I', 'need', 'help'), ('What', 'is', 'this') ...

Now I want to visualize these two list in one diagramm (maybe pairplot) to see if there are smiliarities (all 3 words must be the same).

Thank you in advance

CodePudding user response:

INPUT:

l1 = [('hello', 'its', 'me'), ('I', 'need', 'help') ...]
l2 = [('I', 'need', 'help'), ('What', 'is', 'this') ...]

OUTPUT:

sim = [[('hello', 'its', 'me'), 1], [('I', 'need', 'help'), 2], [('What', 'is', 'this'), 1]]
merged = l1   l2
unique = set(merged)
results = []

for tri in unique:
    results.append([tri, merged.count(tri)])

From your description, it seems that this is what you are looking for. Please let me know if any adjustments are needed.

CodePudding user response:

The answer given from Larry the Llama seem to have missed the "see if there are similarities" as the solution uses set() which will remove any duplicates.

If you desire a full iteration to find fully similar trigrams:

merged = l1   l2

results_counter = {}

# Iterate all the trigrams
for index, trigram in enumerate(merged):
    # Iterate all the trigrams which lay after in the array
    for second_index in range(index, len(merged)):
        all_same = True

        # Find all of which are the same as the comparing trigram
        for word_index, word in enumerate(trigram):
            if merged[second_index][word_index] == trigram[word_index:
                all_same = False
                break
        
        # If trigram was not found in the results_counter add the key else returning the value 
        previous_found = results_counter.setDefault(str(trigram), 0)
        # Add one
        previous_found[str(trigram)]  = 1

# Will print the keys and the 
for key in previous_found.keys():
    # Print the count for each trigram
    print(key, previous_found[key])
  • Related