I have the following while loop to scrape info from a platform:
while result_count != 0:
start_at = "startAt=" str(start_index)
url = base_url toget "&" start_at "&" max_results
response = requests.get(url, auth=(username, password))
json_response = json.loads(response.text)
print (json_response)
page_info = json_response["meta"]["pageInfo"]
start_index = page_info["startIndex"] allowed_results
result_count = page_info["resultCount"]
items2 = json_response["data"]
print(items2)
'toget' variable is dataframe which includes different id's. I need 'toget' variable to ietrate through all elements of pandas dataframe column, returning each time different id, as this is the only way to scrape all informations properly.
import pandas as pd
toget = {'id': [3396750, 3396753, 3396755, 3396757, 3396759]}
CodePudding user response:
If you need to loop through a pandas DataFrame, then recommend reviewing this post: How to iterate over rows in a DataFrame in Pandas
The code in your question declares toget
a dict, not a DataFrame. If that's the case, then you can use the code below to loop through:
Looping through Dict
toget = {'id': [3396750, 3396753, 3396755, 3396757, 3396759]}
for i in toget.get('id'):
print(i)
CodePudding user response:
Just add the for loop to iterate through your list and use that variable in the url.
A few other things I'd clean up here:
- I would use f'{}' syntax for the url, but how you had it is fine...just preference, as I think it's easier to read
- No need to use
json
package to read in the response. You can do that straight away (see edit below)
I'm also making an assumption here that you are setting an initial value for both variables start_index
and max_results
as this code will throw an error of those variables not being defined once it enters the while loop.
Code:
import pandas as pd
toget = {'id': [3396750, 3396753, 3396755, 3396757, 3396759]}
for eachId in toget['id']:
while result_count != 0:
start_at = "startAt=" str(start_index)
url = url = f'{base_url}{eachId}&{start_at}&{max_results}'
response = requests.get(url, auth=(username, password))
json_response = json.loads(response.text)
print (json_response)
page_info = json_response["meta"]["pageInfo"]
start_index = page_info["startIndex"] allowed_results
result_count = page_info["resultCount"]
items2 = json_response["data"]
print(items2)