Home > OS >  Searching for strings in one txt file in another txt file
Searching for strings in one txt file in another txt file

Time:07-24

Here I just want to search for the keyword in the domain file without doing anything on the files. There is one key and one domain in each line. Please consider performance as there is a lot of data.

My code:

def search_keyword_domain():
    # here I just want to search for the keyword in the domain file without doing anything on the files.
    # There is one key and one domain in each line.
    # Please consider performance as there is a lot of data
    
    with open("result.txt", "a") as result:
        result.writelines(line)


def search_keyword():
    with open('domains.txt', 'r') as d:
        for line in d:
            line.strip()
        d.close()

    with open('keywords.txt', 'r') as f:
        for line in f:
            line = line.strip()
            search_keyword_domain(line)
        f.close()


if __name__ == '__main__':
    search_keyword()

EXAMPLE:

strings.txt: Note: There are 180 keywords.

google
messi
apple

domains.txt: Note: There are 280 million domains.

google.com
ronaldovsmess.com
anapple.com

CodePudding user response:

Few things first:

  • No need to have several functions like this. What you could do to make things a bit more general, create a main func with all functions as input arguments.
  • If you are using a context manager you don't need to close the file manually, the context manager does that for you
  • you don't want to loop through your domain file and call in every itertion the func search_keyword_domain(line) where you open the result file, append data and close the file again. Better to collect data, open the file once, write all data to it before it gets closed.
  • the first part of your func search_keyword() where you open domain.txt does nothing... like this, it is useless. You just loop through it once without saving any information or doing something in the loop. Here is my solution:

As input I used keywords.txt:

google
messi
apple
bus
weather

domain.txt:

google.com
ronaldovsmessi.com
anapple.com
twitter.com
weather.com
youtube.com
def search_matching_domains(keywords_file, domain_file, result_file):
    with open(keywords_file, 'r') as f:
        keywords = f.read().splitlines() #keywords is a list with all keywords

    with open(domain_file, 'r') as g: 
        result = [] #collect all matching domains in this list
        for line in g:
            # if any keyword matches the current domain, append the domain
            if any(keyword in line.strip() for keyword in keywords):
                result.append(line)

    with open(result_file, 'w') as h:
        h.write(''.join(result))
        
search_matching_domains('keywords.txt', 'domains.txt', 'result.txt')

output in results.txt:

google.com
ronaldovsmessi.com
anapple.com
weather.com

CodePudding user response:

kw = open("keywords.txt", "r").read()
dw = open("domains.txt", "r").read()

kw_ar = kw.split("\n")[:-1]
dw_ar = dw.split("\n")[:-1]

for k in kw_ar:
    for d in dw_ar:
        if k in d:
            print(k, "-> ", d)

  1. read the two text files
  2. split their contents into arrays
  3. test for every keyword if it is in any line of domains.txt

CodePudding user response:

Modifying your code:

import re
import csv

domains = ['']

# convert csv file to txt
txt_file = "./domains.txt"
csv_file = "1.csv"

with open(txt_file, "w") as my_output_file:
    with open(csv_file, "r") as my_input_file:
        [my_output_file.write(" ".join(row)   '\n') for row in csv.reader(my_input_file)]
    my_output_file.close()

def search_keyword_domain(line):
    matching_domains = []
    for x in domains:
        if re.search(line, x):
            matching_domains.append(x)
    print(matching_domains)

def search_keyword():
    with open('domains.txt', 'r') as d:
        for line in d:
            line = line.strip()
            domains.append(line)
        d.close()
    
    with open('keywords.txt', 'r') as f:
        for line in f:
            line = line.strip()
            search_keyword_domain(line)
        f.close()


if __name__ == '__main__':
    search_keyword()

If you only want to search and plan on using the results for another csv file then you don't have to convert the csv file to text. The results are mostly the same and search_keyword_domain function returns a list you can use to make a new csv file.

  • Related