Hey all trust that you're well, I'm currently trying to make each process(thread) read different txt files or read from one text file with a certain amount assigned to each process.
Example: if the txt file contained 20 user names, process one message to the first 10 users specified in the text file and process two messages to the other 10 users specified in the text file.
Question: How would I read 10 lines in a text file, delete 10 lines, and read the next 10 lines with each process created, assuming that the file has 20 lines?
Reading a specified amount
with open("test.txt", "r") as fp:
for linenr, line in enumerate(fp):
if linenr > 9:
break
elif linenr >= 0:
print(line)
deleting a specified amount
with open("test.txt", 'r ') as fp:
# read an store all lines into list
lines = fp.readlines()
# move file pointer to the beginning of a file
fp.seek(0)
# truncate the file
fp.truncate()
# start writing lines except the first line
# lines[1:] from line 2 to last line
fp.writelines(lines[10:])
Code:
import time
from selenium import webdriver
import threading
import json
def test_instance(data):
Options = webdriver.ChromeOptions()
mobile_emulation = {"userAgent": "Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/101.0.4951.64 Mobile Safari/535.19"}
Options.add_experimental_option("mobileEmulation", mobile_emulation)
Options.add_argument("--log-level=3")
bot = webdriver.Chrome(options=Options, executable_path="chromedriver.exe")
bot.set_window_size(500, 768)
bot.get("https://www.instagram.com/")
time.sleep(10)
# Login section==========================
print('Logging in...')
bot.find_element_by_xpath('//*[@id="react-root"]/section/main/article/div/div/div/div[3]/button[1]').click()
time.sleep(5)
username_field = bot.find_element_by_xpath('//*[@id="loginForm"]/div[1]/div[3]/div/label/input')
username_field.send_keys(data['username'])
time.sleep(5)
password_field = bot.find_element_by_xpath('//*[@id="loginForm"]/div[1]/div[4]/div/label/input')
password_field.send_keys(data['password'])
time.sleep(5)
bot.find_element_by_xpath('//*[@id="loginForm"]/div[1]/div[6]/button').click()
time.sleep(6)
bot.quit()
f = open('accounts.json',)
data = json.load(f)
f.close()
process_count = 2 # number of tests to run (each test open a separate browser)
thread_list = []
# Start test
for i in range(process_count):
t = threading.Thread(name=f'Test {i}', target=test_instance, args=[data[i]])
t.start()
time.sleep(1)
print(t.name ' started')
thread_list.append(t)
# Wait for all threads to complete
for thread in thread_list:
thread.join()
print('Test completed')
CodePudding user response:
Trying to read and writing to the same file by different threads at the same time is generally a very bad idea, especially if you are also seeking and truncating the file. Unless you use a Lock
to serialize access, you cannot be sure which data is read/written in what order.
To keep it simple I would suggest to let the main program read the input file into a list. Give each thead a slice of that list to act on.
Example of reading a list of users:
# Read file into list:
with open("users.txt") as uf:
users = [ln.strip() for ln in tf if ln[0] not in '\r\n']
Let's break this code (it's called a list comprehension) down:
for ln in tf
Iterates over the lines in the file.
if ln[0] not in '\r\n'
This skips empty lines.
ln.strip()
This removes e.g. newlines and carriage returns.
Note that after the with
-statement is finished, uf
is a closed file, so you can't read from it anymore.
Creating pairs for threads to iterate over
Let's say that numusers
(that is len(users)
) is 34.
numusers = 34
We can then create a list of pairs like this:
im = [n for n in range(numusers 1) if n % 10 == 0 or n == numusers]
This would produce the list [0, 10, 20, 30, 34]
Now to create pairs:
pairs = list(zip(im[:-1], im[1:]))
Then pairs
will be [(0, 10), (10, 20), (20, 30), (30, 34)]
Using a ThreadPoolExecutor
You can then write a function that takes a 2-tuple like (0, 10)
as an argument, and does something for each of those users.
import concurrent.futures as cf
def target(pair):
first, last = pair
for user in users[first, last]:
# do the whole login thing
# You should probably at least return a success or error code.
return f"users {users[first]} to {users[last-1]} processed"
with cf.ThreadPoolExecutor(max_workers=2) as exec:
results = exec.map(target, pairs)
I would suggest having each thread put the data it wants to write in a list. When all the worker threads are finished, concatenate the lists and then write them to an output file from the main thread.
Alternatively, you can write from each thread, but you have to protect the file access with a Lock
, so you don't have multiple processes trying to write to the same file at once.
Another thing to keep in mind is that chrome
is not a leightweight piece of software! Running too many instances of it at the same time might overload your PC or saturate your network connection.