Lock with Multi-threading-CodePudding

Hey all trust that you're well, I'm currently trying to make each process(thread) read different txt files or read from one text file with a certain amount assigned to each process.

Example: if the txt file contained 20 user names, process one message to the first 10 users specified in the text file and process two messages to the other 10 users specified in the text file.

Question: How would I read 10 lines in a text file, delete 10 lines, and read the next 10 lines with each process created, assuming that the file has 20 lines?

Reading a specified amount

with open("test.txt", "r") as fp:
    for linenr, line in enumerate(fp):
        if linenr > 9:
            break
        elif linenr >= 0:
            print(line)

deleting a specified amount

with open("test.txt", 'r ') as fp:
    # read an store all lines into list
    lines = fp.readlines()
    # move file pointer to the beginning of a file
    fp.seek(0)
    # truncate the file
    fp.truncate()

    # start writing lines except the first line
    # lines[1:] from line 2 to last line
    fp.writelines(lines[10:])

Code:

import time
from selenium import webdriver
import threading
import json

def test_instance(data):
    Options = webdriver.ChromeOptions()
    mobile_emulation = {"userAgent": "Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/101.0.4951.64 Mobile Safari/535.19"}
    Options.add_experimental_option("mobileEmulation", mobile_emulation)
    Options.add_argument("--log-level=3")

    bot = webdriver.Chrome(options=Options, executable_path="chromedriver.exe")
    bot.set_window_size(500, 768)
    bot.get("https://www.instagram.com/")
    
    time.sleep(10)

    # Login section==========================
    print('Logging in...')
    bot.find_element_by_xpath('//*[@id="react-root"]/section/main/article/div/div/div/div[3]/button[1]').click()
    time.sleep(5)
    username_field = bot.find_element_by_xpath('//*[@id="loginForm"]/div[1]/div[3]/div/label/input')
    username_field.send_keys(data['username'])
    time.sleep(5)
    password_field = bot.find_element_by_xpath('//*[@id="loginForm"]/div[1]/div[4]/div/label/input')
    password_field.send_keys(data['password'])
    time.sleep(5)
    bot.find_element_by_xpath('//*[@id="loginForm"]/div[1]/div[6]/button').click()
    time.sleep(6)
    
    bot.quit()

f = open('accounts.json',)
data = json.load(f)
f.close()
process_count = 2 # number of tests to run (each test open a separate browser)
thread_list = []

# Start test
for i in range(process_count):
    t = threading.Thread(name=f'Test {i}', target=test_instance, args=[data[i]])
    t.start()
    time.sleep(1)
    print(t.name   ' started')
    thread_list.append(t)

# Wait for all threads to complete
for thread in thread_list:
    thread.join()

print('Test completed')

CodePudding user response：

Trying to read and writing to the same file by different threads at the same time is generally a very bad idea, especially if you are also seeking and truncating the file. Unless you use a Lock to serialize access, you cannot be sure which data is read/written in what order.

To keep it simple I would suggest to let the main program read the input file into a list. Give each thead a slice of that list to act on.

Example of reading a list of users:

# Read file into list:
with open("users.txt") as uf:
    users = [ln.strip() for ln in tf if ln[0] not in '\r\n']

Let's break this code (it's called a list comprehension) down:

for ln in tf

Iterates over the lines in the file.

if ln[0] not in '\r\n'

This skips empty lines.

ln.strip()

This removes e.g. newlines and carriage returns.

Note that after the with-statement is finished, uf is a closed file, so you can't read from it anymore.

Creating pairs for threads to iterate over

Let's say that numusers (that is len(users)) is 34.

numusers = 34

We can then create a list of pairs like this:

im = [n for n in range(numusers 1) if n % 10 == 0 or n == numusers]

This would produce the list [0, 10, 20, 30, 34]

Now to create pairs:

pairs = list(zip(im[:-1], im[1:]))

Then pairs will be [(0, 10), (10, 20), (20, 30), (30, 34)]

Using a ThreadPoolExecutor

You can then write a function that takes a 2-tuple like (0, 10) as an argument, and does something for each of those users.

import concurrent.futures as cf


def target(pair):
    first, last = pair
    for user in users[first, last]:
        # do the whole login thing
    # You should probably at least return a success or error code.
    return f"users {users[first]} to {users[last-1]} processed"

with cf.ThreadPoolExecutor(max_workers=2) as exec:
    results = exec.map(target, pairs)

I would suggest having each thread put the data it wants to write in a list. When all the worker threads are finished, concatenate the lists and then write them to an output file from the main thread.

Alternatively, you can write from each thread, but you have to protect the file access with a Lock, so you don't have multiple processes trying to write to the same file at once.

Another thing to keep in mind is that chrome is not a leightweight piece of software! Running too many instances of it at the same time might overload your PC or saturate your network connection.