I'm using an import from an Alteryx data flow, which is a single column that contains the following integer format:
Reading the data in from alteryx converts it to a dataframe automatically.
searches = Alteryx.read("dataimport")
SUCCESS: reading input data "dataimport"
RxNorm_Id
0 99
1 161
2 167
3 168
4 197
... ...
6711 2562541
6712 2565823
6713 2566308
6714 2566416
6715 2571104
I have a for loop that looks through URL's and replaces a segment with the search.
for search in searches:
print(f"Scraping {search}")
url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search}/historystatus.json?caller=RxNav"
print(url)
When I attempt to run the data through the loop it starts with the header name first
Scraping RxNorm_Id https://rxnav.nlm.nih.gov/REST/rxcui/**RxNorm_Id**/historystatus.json?caller=RxNav
I'm not exactly sure why it uses the header first, but obviously it causes an error because the search doesn't exist.
If I try to change the dataframe to a list, it wraps each item in a square bracket. Such as:
Info: Python (2): [[99], [161], [167], [168], [197], [272], [281], [376]]
searches = searches.values.tolist()
Scraping [99]
https://rxnav.nlm.nih.gov/REST/rxcui/[99]/historystatus.json?caller=RxNav
If I hardcode searches as [99,161,167,168,197,272,281,376] my loop works without an issue.
How can I get the initial data frame in that format? Or how can I get the tolist function to not wrap each number in square brackets.
I understand my data source is secure and using Alteryx prevents me from replicating the data source. But, this should be enough information to solve the issue.
Below is my entire code trimmed for easily reproducible:
from ayx import Alteryx
from numpy import dtype
import pandas as pd
import requests
searches = Alteryx.read("dataimport")
# searches = searches.values.tolist()
# for search in searches: attempt for the tolist() function
for search in [searches]:
print(f"Scraping {search}")
url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search}/historystatus.json?caller=RxNav"
print(url)
data = s.get(url,headers=headers).json() #results from second redirect
print(data)
a = data['rxcuiStatusHistory']['definitionalFeatures']
b = data['rxcuiStatusHistory']['attributes']
print(b)
rxcui = b['rxcui']
name = b['name']
print(rxcui)
print(name)
try:
baserxcui = a['ingredientAndStrength'][0]['baseRxcui']
basename = a['ingredientAndStrength'][0]['baseName']
print(baserxcui)
print(basename)
except KeyError:
baserxcui = rxcui
basename = name
print(baserxcui)
print(basename)
try:
bossrxcui = a['ingredientAndStrength'][0]['bossRxcui']
bossname = a['ingredientAndStrength'][0]['bossName']
print(bossrxcui)
print(bossname)
except KeyError:
bossrxcui = rxcui
bossname = name
print(bossrxcui)
print(bossname)
CodePudding user response:
You can just take the first item inside search since search is a list. So your code then becomes something like this:
for search in searches:
print(f"Scraping {search[0]}")
url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search[0]}/historystatus.json?caller=RxNav"
print(url)
Or you can simple just change
searches = searches.values.tolist()
to
searches = [i[0] for i in searches.values.tolist()]