Use dataframe row values as input for an API request-CodePudding

I have a dataframe, let's say like this:

id	someValue
12345	A
67890	B
98765	C
43210	D
10987	E
12321	F

How can I use individual row values of the id column as inputs in an API request body and loop for each row? As in, the first request, use id value of row 1, then the second request use id value of row 2, row 3, row 4, etc.

The API request looks like this:

import requests
import json

url = "https://api-url.com/api/v1/endpoint"

payload="{\"query\":\"{\\n  report(id:"{ID VALUE OF ROW A}") {\\n    reportFieldA {\\n      name\\n      dimension\\n    }\\n  }\\n}\",\"variables\":{}}"

headers = {
  'authorization': authtoken
response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

CodePudding user response：

Looping through rows in a dataframe is implemented using collect(). It's not recommended, as all the data then is moved to the driver. But hopefully, you know what you're doing.

Something like the following should work:

def my_api_call(id):
    url = "https://api-url.com/api/v1/endpoint"

    payload = fr"""{{"query":"{{\n  report(id:"{id}") {{\n    reportFieldA {{\n      name\n      dimension\n    }}\n  }}\n}}","variables":{{}}}}"""

    headers = {
      'authorization': authtoken
    response = requests.request("POST", url, headers=headers, data=payload)

    print(response.text)

ids = [r['id'] for r in df.collect()]

for id in ids:
    my_api_call(id)

I have also changed your payload to accept the f-string. I also added a prefix r so that you would not need to escape every " and \. Inside the f-string { and } are escaped using double braces {{ and }}.

CodePudding user response：

you can use foreach for iterating elements from Dataframe columns

>>> def func1(x):
...     id=x.id
...     someValue=x.someValue
...     newpayload=payload.replace('ID VALUE OF ROW A',someValue)
...     return (id,someValue,newpayload)
...
>>> rdd2=df.rdd.map(lambda x: func1(x))
>>> df2=rdd2.toDF()
>>> df2=df2.withColumnRenamed("_1","id").withColumnRenamed("_2","someValue").withColumnRenamed("_3","payload")
>>> df2.show(truncate=False)
 ----- --------- ---------------------------------------------------------------------------------------------------------------- 
|id   |someValue|payload                                                                                                         |
 ----- --------- ---------------------------------------------------------------------------------------------------------------- 
|12345|A        |{"query":"{\n  report(id:{A}) {\n    reportFieldA {\n      name\n      dimension\n    }\n  }\n}","variables":{}}|
|67890|B        |{"query":"{\n  report(id:{B}) {\n    reportFieldA {\n      name\n      dimension\n    }\n  }\n}","variables":{}}|
|98765|C        |{"query":"{\n  report(id:{C}) {\n    reportFieldA {\n      name\n      dimension\n    }\n  }\n}","variables":{}}|
|43210|D        |{"query":"{\n  report(id:{D}) {\n    reportFieldA {\n      name\n      dimension\n    }\n  }\n}","variables":{}}|
|10987|E        |{"query":"{\n  report(id:{E}) {\n    reportFieldA {\n      name\n      dimension\n    }\n  }\n}","variables":{}}|
|12321|F        |{"query":"{\n  report(id:{F}) {\n    reportFieldA {\n      name\n      dimension\n    }\n  }\n}","variables":{}}|
 ----- --------- ----------------------------------------------------------------------------------------------------------------

similarly you can add header column and pass apyload and all ither values, hope this helps.