Home > front end >  Count frequency of strings in found in List2 from List1 matched in any order
Count frequency of strings in found in List2 from List1 matched in any order

Time:11-16

I'm trying to iterate through a list of strings, (keyword_list) to find matches in a second list, (target_keyword_list) in any order and return the count.

At the moment my code only finds exact matches, but I'd like to return a match if all of the words are found in any order.

For example, the code below returns a value of 1, but I'd like it to return 3

Minimum Reproducible Example Below

keyword_list = ["nike shoes"]
target_keyword_list = ["nike shoes", "shoes nike", "nike airmax shoes"]


check_list = []
for i in keyword_list:
    check_freq = sum(i in s for s in target_keyword_list)
    check_list.append(check_freq)


print(check_list)

Ideally I'd like to modify the existing code if possible.

CodePudding user response:

For each element of the keyword_list list, verify that the set of words is a subset of the set of a target, and sum for each target

keyword_list = ["nike shoes", "airmax shoes"]
target_keyword_list = ["nike shoes", "shoes nike", "nike airmax shoes"]

check_list = []
for keywords in keyword_list:
    check_list.append(
        sum(set(keywords.split()).issubset(target.split())
            for target in target_keyword_list)
    )

print(check_list)  # [3, 1]

Could be easier to get it with a more verbose syntax

check_list = []
for keywords in keyword_list:
    ks = set(keywords.split())
    count = 0
    for target in target_keyword_list:
        count  = ks.issubset(target.split())
    check_list.append(count)

CodePudding user response:

So each keyword is actually a sequence of spaced words, each of which must be present for it to be a match?

This is how I would do it. The list comprehension is kinda pushing the limit for how big I like them to be, and I would be tempted to just break it out into a regular for loop, but since you had one I kept one.

keyword_list = ["nike shoes", "puma shoes", "nike sneakers", "puma sneakers"]
target_keyword_list = ["nike shoes", "shoes nike", "nike airmax shoes", "puma sneakers"]

check_list = []
for keyword in keyword_list:
    check_freq = sum(all(part in target_keyword for part in keyword.split(' ')) for target_keyword in target_keyword_list)
    check_list.append(check_freq)

[3, 0, 0, 1]

CodePudding user response:

You can use set intersection

keyword_list = ["nike shoes"]
target_keyword_list = ["nike shoes", "shoes nike",'jack nike', "nike airmax shoes",'shoes for jim']


check_list = {}
for keyword in keyword_list:
    temp0 = set(keyword.split())
    for target in target_keyword_list:
        temp1 = set(target.split())
        intersection = temp0 & temp1
        found = temp0 == intersection
        check_list[target] = found
for k,v in check_list.items():
  print(f'{k} --> {v}')

output

nike shoes --> True
shoes nike --> True
jack nike --> False
nike airmax shoes --> True
shoes for jim --> False

CodePudding user response:

You could try with set difference:

check_list = list()
for keyword in keyword_list:
    check_list.append(len([t for t in target_keyword_list if not set(keyword.split()).difference(set(t.split()))]))

CodePudding user response:

If I get it correctly, you are trying to see how many combinations there are in target_keyword_list that have at least one word from each key in keyword_list.

For that you'd have to first separate strings into separate words, you can do that with .split() string method:

for kw_item in keyword_list:
    words = kw_item.split()
    # ['nike', 'shoes']

Then go through the second list and check if each key has at least one word from the words. If you only want to check for same letter combinations, you can simply check each each word against each string in target_keyword_list:

for t_item in target_keyword_list:
    if word in t_item:
        count  = 1

And then add count to the result before going to the next item in kw_items.

If the idea is to check for each separate word (consider shoe and shoes different words), then you'll have to also split every string in target_keyword_list and run a direct comparison between each word in those dictionaries.

for t_item in target_keyword_list:
    t_words = t_item.split()
    for word in words:
        if word in t_words:
        count  = 1

This is not the most elegant or short logic but it's easy to understand and apply to different situations, in case you need a slightly different solution. Also, once you understand how it works, you can shorten for in and if constructions into oneliners.

  • Related