Home > Mobile >  generating a score for a domain with a keyword
generating a score for a domain with a keyword

Time:08-03

I have keywords in my keyword.txt list and I have domains in my domain.txt list. I want to generate a score with a check. For example, if the keyword is apple, it will be 30 points, and I will search for it in all domains and give points for each domain. How do I do that ?

My code:

score_dict = {"apple":"30",
              "bananas":"50"}

def generate_score():
    with open("keyword.txt", "r") as file:
        keywords = file.read().splitlines()

    with open("domain.txt", 'r') as g:
        score = []
        for line in g:
            if any(keyword in line.strip() for keyword in keywords):
                score.append(line)

keyword.txt:

apple #if the domain contains the word apple, it will get a  30 score

domain.txt:

apple.com
redapple.com
paple.com

The output I want:

apple.com , 30
redapple.com, 30
paple.com, 0

CodePudding user response:

Here is an iterative approach that loops the lines in domain.txt and searches for partial string matches from the dictionary:

score_dict = {"apple":"30",
              "bananas":"50"}

results = []
with open("domain.txt", 'r') as g:
    for domain in g.readlines(): #loop lines in domain.txt
        hit = None
        for substring, score in score_dict.items(): #loop score_dict
            if substring in domain:
                hit = True
                results.append({'domain': domain.strip(), 'substring':substring, 'score': score})
                break #break on hit to avoid unnecessary iterations
        if not hit: #assign score 0 if there is no hit
            results.append({'domain': domain.strip(), 'substring':substring, 'score': 0})

Output:

[{'domain': 'apple.com', 'substring': 'apple', 'score': '30'},
 {'domain': 'redapple.com', 'substring': 'apple', 'score': '30'},
 {'domain': 'paple.com', 'substring': 'bananas', 'score': 0}]

Note that this solution can be slow when working with large documents. In that case you could vectorize the issue using pandas with str.extract:

import pandas as pd

score_dict = {"apple":"30",
              "bananas":"50"}

df = pd.read_csv('domain.txt', names=['string']) #read domain.txt as pandas DataFrame
df['score']=(df['string'].str.extract('(' '|'.join(score_dict.keys()) ')',expand=False).map(score_dict)).fillna(0)
df.to_csv('output.csv') # save output to csv

Results:

string score
0 apple.com 30
1 redapple.com 30
2 paple.com 0
  • Related