I'm trying to iterate through a list of strings, (keyword_list
) to find matches in a second list, (target_keyword_list
) in any order and return the count.
At the moment my code only finds exact matches, but I'd like to return a match if all of the words are found in any order.
For example, the code below returns a value of 1, but I'd like it to return 3
Minimum Reproducible Example Below
keyword_list = ["nike shoes"]
target_keyword_list = ["nike shoes", "shoes nike", "nike airmax shoes"]
check_list = []
for i in keyword_list:
check_freq = sum(i in s for s in target_keyword_list)
check_list.append(check_freq)
print(check_list)
Ideally I'd like to modify the existing code if possible.
CodePudding user response:
For each element of the keyword_list
list, verify that the set
of words is a subset of the set
of a target, and sum for each target
keyword_list = ["nike shoes", "airmax shoes"]
target_keyword_list = ["nike shoes", "shoes nike", "nike airmax shoes"]
check_list = []
for keywords in keyword_list:
check_list.append(
sum(set(keywords.split()).issubset(target.split())
for target in target_keyword_list)
)
print(check_list) # [3, 1]
Could be easier to get it with a more verbose syntax
check_list = []
for keywords in keyword_list:
ks = set(keywords.split())
count = 0
for target in target_keyword_list:
count = ks.issubset(target.split())
check_list.append(count)
CodePudding user response:
So each keyword is actually a sequence of spaced words, each of which must be present for it to be a match?
This is how I would do it. The list comprehension is kinda pushing the limit for how big I like them to be, and I would be tempted to just break it out into a regular for loop, but since you had one I kept one.
keyword_list = ["nike shoes", "puma shoes", "nike sneakers", "puma sneakers"]
target_keyword_list = ["nike shoes", "shoes nike", "nike airmax shoes", "puma sneakers"]
check_list = []
for keyword in keyword_list:
check_freq = sum(all(part in target_keyword for part in keyword.split(' ')) for target_keyword in target_keyword_list)
check_list.append(check_freq)
[3, 0, 0, 1]
CodePudding user response:
You can use set
intersection
keyword_list = ["nike shoes"]
target_keyword_list = ["nike shoes", "shoes nike",'jack nike', "nike airmax shoes",'shoes for jim']
check_list = {}
for keyword in keyword_list:
temp0 = set(keyword.split())
for target in target_keyword_list:
temp1 = set(target.split())
intersection = temp0 & temp1
found = temp0 == intersection
check_list[target] = found
for k,v in check_list.items():
print(f'{k} --> {v}')
output
nike shoes --> True
shoes nike --> True
jack nike --> False
nike airmax shoes --> True
shoes for jim --> False
CodePudding user response:
You could try with set
difference
:
check_list = list()
for keyword in keyword_list:
check_list.append(len([t for t in target_keyword_list if not set(keyword.split()).difference(set(t.split()))]))
CodePudding user response:
If I get it correctly, you are trying to see how many combinations there are in target_keyword_list
that have at least one word from each key in keyword_list
.
For that you'd have to first separate strings into separate words, you can do that with .split()
string method:
for kw_item in keyword_list:
words = kw_item.split()
# ['nike', 'shoes']
Then go through the second list and check if each key has at least one word from the words
. If you only want to check for same letter combinations, you can simply check each each word
against each string in target_keyword_list
:
for t_item in target_keyword_list:
if word in t_item:
count = 1
And then add count to the result before going to the next item in kw_items
.
If the idea is to check for each separate word (consider shoe and shoes different words), then you'll have to also split every string in target_keyword_list
and run a direct comparison between each word in those dictionaries.
for t_item in target_keyword_list:
t_words = t_item.split()
for word in words:
if word in t_words:
count = 1
This is not the most elegant or short logic but it's easy to understand and apply to different situations, in case you need a slightly different solution. Also, once you understand how it works, you can shorten for in
and if
constructions into oneliners.