Home > OS >  How to change variable in a while loop from hardcoded to incorporate pandas iterrows loop?
How to change variable in a while loop from hardcoded to incorporate pandas iterrows loop?

Time:03-11

I have the following while loop to scrape info from a platform:

while result_count != 0:  
   start_at = "startAt="   str(start_index)
   url = base_url   toget   "&"   start_at   "&"   max_results  
   response = requests.get(url, auth=(username, password))  
   json_response = json.loads(response.text)
   print (json_response)
   page_info = json_response["meta"]["pageInfo"]
   start_index = page_info["startIndex"]   allowed_results  
   result_count = page_info["resultCount"]
   items2 = json_response["data"]
   print(items2)

'toget' variable is dataframe which includes different id's. I need 'toget' variable to ietrate through all elements of pandas dataframe column, returning each time different id, as this is the only way to scrape all informations properly.

import pandas as pd
toget = {'id': [3396750, 3396753, 3396755, 3396757, 3396759]}

CodePudding user response:

If you need to loop through a pandas DataFrame, then recommend reviewing this post: How to iterate over rows in a DataFrame in Pandas

The code in your question declares toget a dict, not a DataFrame. If that's the case, then you can use the code below to loop through:

Looping through Dict

toget = {'id': [3396750, 3396753, 3396755, 3396757, 3396759]}

for i in toget.get('id'):
    print(i)

CodePudding user response:

Just add the for loop to iterate through your list and use that variable in the url.

A few other things I'd clean up here:

  1. I would use f'{}' syntax for the url, but how you had it is fine...just preference, as I think it's easier to read
  2. No need to use json package to read in the response. You can do that straight away (see edit below)

I'm also making an assumption here that you are setting an initial value for both variables start_index and max_results as this code will throw an error of those variables not being defined once it enters the while loop.

Code:

import pandas as pd

toget = {'id': [3396750, 3396753, 3396755, 3396757, 3396759]}

for eachId in toget['id']:
    while result_count != 0:  
       start_at = "startAt="   str(start_index)
       url = url = f'{base_url}{eachId}&{start_at}&{max_results}'  
       response = requests.get(url, auth=(username, password))  
       json_response = json.loads(response.text)
       print (json_response)
       page_info = json_response["meta"]["pageInfo"]
       start_index = page_info["startIndex"]   allowed_results  
       result_count = page_info["resultCount"]
       items2 = json_response["data"]
       print(items2)
  • Related