Home > Enterprise >  For loop stops after returning results of 1st element
For loop stops after returning results of 1st element

Time:03-23

I have the following scraping script. I need to loop through many links which differ by T_ID's included in data dictionary. The script is printing the result only for the first T_ID. Any idea how to improve this loop so it prints results for all T_ID's?

import requests  
import json
import csv
import sys

from bs4 import BeautifulSoup

data = {'T_ID': [3396750, 3396753, 3396755, 3396757, 3396759]}

base_url = "XXXX"  
username = "XXXX"  
password = "XXXX"
toget = data

allowed_results = 50  
max_results = "maxResults="   str(allowed_results)
tc = "/tcyc?"

result_count = -1  
start_index = 0  

df = pd.DataFrame(
   columns=['id', 'name', 'gId', 'dKey', 'tPlan'])

for eachId in toget['T_ID']:
    while result_count != 0:  
        start_at = "startAt="   str(start_index)
        url = url = f'{base_url}{eachId}{tc}&{start_at}&{max_results}'  
        response = requests.get(url, auth=(username, password))  
        json_response = json.loads(response.text)
        print(json_response)
        page_info = json_response["meta"]["pageInfo"]
        start_index = page_info["startIndex"]   allowed_results  
        result_count = page_info["resultCount"]
        items2 = json_response["data"]
        print(items2)

        for item in items2:
            new_item = {'id': item['id'], **item['fields']}
            df = df.append(new_item, ignore_index=True)
            print (item["id"])
            print (item["project"])
            print (item["fields"]["name"])
            print (item["fields"]["gId"])
            print (item["fields"]["dKey"])
            print (item["fields"]["tPlan"])

CodePudding user response:

It doesn't stop, it actually runs all the way through. The issue is the start_index after it iterates through the first eachId is no longer 0. So when it gets to the next id, it's looking at something like:

`'XXXX.com/3396753/tcyc?&startAt=123&maxResults=50'`

And then likely returning a result_count of 0, which means the while loop doesn't run. Then it goes to the next id, and the same thing occurs.

Move your initial result_count = -1 and start_index = 0 within the loop, before the while. As you'd want those to "reset" for each 'T_ID':

import pandas as pd
import requests  
import json
import csv
import sys

from bs4 import BeautifulSoup

data = {'T_ID': [3396750, 3396753, 3396755, 3396757, 3396759]}

base_url = "XXXX"  
username = "XXXX"  
password = "XXXX"
toget = data

allowed_results = 50  
max_results = "maxResults="   str(allowed_results)
tc = "/tcyc?"




df = pd.DataFrame(
   columns=['id', 'name', 'gId', 'dKey', 'tPlan'])

for eachId in toget['T_ID']:
    start_index = 0  
    result_count = -1  
    while result_count != 0:  
        start_at = "startAt="   str(start_index)
        url = url = f'{base_url}{eachId}{tc}&{start_at}&{max_results}'  
        response = requests.get(url, auth=(username, password))  
        json_response = json.loads(response.text)
        print(json_response)
        page_info = json_response["meta"]["pageInfo"]
        start_index = page_info["startIndex"]   allowed_results  
        result_count = page_info["resultCount"]
        items2 = json_response["data"]
        print(items2)

        for item in items2:
            new_item = {'id': item['id'], **item['fields']}
            df = df.append(new_item, ignore_index=True)
            print (item["id"])
            print (item["project"])
            print (item["fields"]["name"])
            print (item["fields"]["gId"])
            print (item["fields"]["dKey"])
            print (item["fields"]["tPlan"])
  • Related