I need to populate a column in a dataframe based on the result of a web service which uses data from another column. The Web Service I am using only allows for 3 concurrent requests at the time. I was wondering if I can control that when using the pandas dataframe apply() method or if I should look for a different alternative (e.g. "walking / looping" through the dataframe records three at the time). Here is a sample code:
import pandas as pd
import requests
# Function to call a web service
# Return NonActive for non active and Active for active
def get_status(x):
status_web_service = r'http://www.example.com/?id=' x
response = requests.get(status_web_service)
return response.text()
# Main body starts here
df = pd.DataFrame([['1', 'Jane'], ['2', 'John']] , columns=['id', 'Name'])
df['Status'] = df['id'].apply(get_status)
My expected output (content of the dataframe) would be something like:
id, Name, Status
1, Jane, Active
2, John, NonActive
Any suggestion is welcome, either to work the issue this way or through a better alternative.
Thank you.
CodePudding user response:
Given that .apply() resolves each row before going into the next, I do not think you will have problems with concurrent requests. However, you might need to consider putting in a small sleep timer (e.g. time.sleep()) within your apply function if the web services has a limit on request frequency.