Home > OS >  how to fetch data through multiple account with threading in python3
how to fetch data through multiple account with threading in python3

Time:04-21

I want to achieve a function that can fetch data in parallel.

The background is the information of 100 sites can be fetched from site A.

the same account can't be used more than once at a time, so I created 5 different accounts on site A that eanble me to fetch information with 5 accounts.

account info like

worker1 pawd
worker2 pawd
worker3 pawd
worker4 pawd
worker5 pawd

if you want to get information of site B from site A . then you need to type cmd like get info for siteB_IP on site A.

suppose there are 100 IPs are stored in a list names IPlist

how to fetch information of 100 IPs with 5 avaliable accounts in parallel by threading , and then all of the information can be sotored in a variable without conflict.

what I have tried is below , below codes can not be executed due to I have no way to achieve the solution:

import threading

user = 'root'
pwd = 'Changeme123'
# the first step is to logon with default account
rs = link.send_cmd(r':lognew:'   '"'   user   '","'   pwd   '"')
# then get all nebor ip from the logon site, the function parse_multi is used for parsing data
IPlist = parse_multi(link.send_cmd('get-IP-info:0xffff'))

def Fetchinfo(user, ip):
    rs = link.send_cmd(r':lognew:'   '"'   user   '","'   pwd   '"')
    areainfo = link.send_cmd('get info for '   site_IP)


for ip in IPlist:
    # how to handle 100 IPs in the situstion of 5 accounts avaliable ?    
    thread = threading.thread(target = Fetchinfo, args = [worker, ip]

CodePudding user response:

Since you don't want calls from the same account id and passwords to happen concurrently, you can define a function that sequentially loops through a sub-list of IPs to fetch synchronously:

def fetch_data_for_ips(account_id, account_password, ips_to_fetch):
    results = list()
    for ip_to_fetch in ips_to_fetch:
        # fetch with the account_id and password synchronously
        result = ...
        results.append(result)
    return results // Added this

Then, use a thread pool, to run the different batches concurrently for each account:

from concurrent.futures import ThreadPoolExecutor, as_completed

# Split the workload for each account to fetch
num, remainder = divmod(len(ip_list), len(accounts))
num_ips_for_each_account = num   bool(remainder)

# This gives e.g. [[1,2,3], [4,5,6]], where each sublist is for each account to fetch
ip_lists_for_each_account = [ip_list[i: i   num_ips_for_each_account] for i in range(0, len(ip_list), num_ips_for_each_account)]


# You should only need number of threads = to the number of accounts you have
with ThreadPoolExecutor(len(accounts)) as executor:
    # Feel free to use a set instead if you don't need to know which result came from which thread
    futures = dict()
    results = list()

    for (account_id, account_password), ips_to_fetch in zip(accounts, ip_lists_for_each_account):
        future = executor.submit(fetch_data_for_ips, account_id, account_password, ips_to_fetch)
        futures[future] = account_id

    for future in as_completed(futures):
        result = future.result()
        account_id = futures[future]
        print(f'{account_id} fetched these:', result)

        results.extend(result)

CodePudding user response:

you can refer to below sample code as rcshon suggested .

def fetch_data_for_ips(account_id,ips_to_fetch):
    results = list()
    for ip_to_fetch in ips_to_fetch:
        # fetch with the account_id and password synchronously
        result = ','.join((account_id,ip_to_fetch))
        results.append(result)
    return results

from concurrent.futures import ThreadPoolExecutor, as_completed
accounts = ['worker1','worker2','worker3','worker4','worker5']
ip_list = [str(_) for _ in range(10)]

# Split the workload for each account to fetch
num, remainder = divmod(len(ip_list), len(accounts))
num_ips_for_each_account = num   bool(remainder)

# This gives e.g. [[1,2,3], [4,5,6]], where each sublist is for each account to fetch
ip_lists_for_each_account = [ip_list[i: i   num_ips_for_each_account] for i in range(0, len(ip_list), num_ips_for_each_account)]

# You should only need number of threads = to the number of accounts you have
with ThreadPoolExecutor(len(accounts)) as executor:
    # Feel free to use a set instead if you don't need to know which result came from which thread
    futures = dict()
    results = list()

    for account_id, ips_to_fetch in zip(accounts, ip_lists_for_each_account):
        future = executor.submit(fetch_data_for_ips, account_id, ips_to_fetch)
        futures[future] = account_id

    for future in as_completed(futures):
        result = future.result()
        account_id = futures[future]
        print(f'{account_id} fetched these:', result)

        results.extend(result)

output :
worker3 fetched these: ['worker3,4', 'worker3,5']
worker2 fetched these: ['worker2,2', 'worker2,3']
worker1 fetched these: ['worker1,0', 'worker1,1']
worker4 fetched these: ['worker4,6', 'worker4,7']
worker5 fetched these: ['worker5,8', 'worker5,9']
  • Related