Multiprocessing queries new lines from a file-CodePudding

I'm trying to use Multiprocessing to speed-up times.The goal is; processes will queries into domain defined inside a text file. Upon executing; the multiprocesses just doing the same: every process queries from the first line instead of new lines per process. So the main target; each process queries domain listed in the new lines from source .txt. Here's the used code:

class diginfo:
    expected_response = 101
    control_domain = 'd2f99r5bkcyeqq.cloudfront.net'
    payloads = { "Host": control_domain, "Upgrade": "websocket", "DNT":  "1", "Accept-Language": "*", "Accept": "*/*", "Accept-Encoding": "*", "Connection": "keep-alive, upgrade", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36" }
    file_hosts = ""
    result_success = []
    num_file = 1
    columns = defaultdict(list)
    txtfiles= []
    hostpath = 'host'

def engines(counts, terminate, reach):
    for domain in domainlist:
        try:
            r = requests.get("http://"   domain, headers=headers, timeout=0.7, allow_redirects=False)
            if r.status_code == expected_response:
                print("Success"   domain)
                print(domain, file=open("RelateCFront.txt", "a"))
                result_success.append(str(domain))
            elif r.status_code != expected_response:
                print("Failed"   domain   str(r.status_code))

    print(" Loaded : "    str(len(diginfo.result_success)))
    if len(diginfo.result_success) >= 0:
        print(" Successfull Result : ")
    for result in diginfo.result_success:
        print("  "   result)
    print("")
    while not terminate.is_set():
        reach.set()
        break

def fromtext():
        global headers, domainlist
        files = os.listdir(diginfo.hostpath)
        for f in files:
            if fnmatch.fnmatch(f, '*.txt'):
                print( str(diginfo.num_file),str(f))
                num_file=diginfo.num_file 1
                diginfo.txtfiles.append(str(f))

        fileselector = input("Choose Target Files : ")
        print("Target Chosen : "   diginfo.txtfiles[int(fileselector)-1])
        file_hosts = str(diginfo.hostpath)  "/"  str(diginfo.txtfiles[int(fileselector)-1])

        with open(file_hosts) as f:
            parseddom = f.read().split()
            
        domainlist = list(set(parseddom))
        domainlist = list(filter(None, parseddom))

        terminate = Event()
        reach = Event()
        for counts in range(cpu_count()):
            p = Process(target=engines, args=(counts, terminate, reach))
            p.start()
        reach.wait()
        terminate.set()
        sleep(3)
        exit()

fromtext()

Here's what i have done:

for domain in domainlist:
    p = Process(target=engines, args=(domainlist, terminate, reach))
    p.start()

It's seems wont respond and resulted in 0 result and infinite processes. I can't pass counts argument since its only accept 3 arguments. Terminate and Reach used to give signal after requirements reached.

Problematic Code

Problematic Screenshot

CodePudding user response：

You need to split domainlist up into cpu_count() sections, and pass each section to a different process. You're also using the events incorrectly: currently, it will exit 3 seconds after any process finishes, regardless of whether the others are still working. You should use a Barrier instead, or just call join() on each process in fromtext():

def engines(domainsublist):
    for domain in domainsublist:
        ...

def fromtext():
    ...
    num_cpus = cpu_count()
    processes = []
    for process_num in range(num_cpus):
        section = domainlist[process_num::num_cpus]
        p = Process(target=engines, args=(section,))
        p.start()
        processes.append(p)
    for p in processes:
        p.join()

Finally, you've got some race conditions in engines(): when you write to RelateCFront.txt and when you append to diginfo.result_success. There are plenty of good solutions for these on SO; I won't try to fix them here.