Home > other >  Dataframe tolist adds [] or the dataframe reads header while going through a for loop. How can I get
Dataframe tolist adds [] or the dataframe reads header while going through a for loop. How can I get

Time:03-17

I'm using an import from an Alteryx data flow, which is a single column that contains the following integer format:

enter image description here

Reading the data in from alteryx converts it to a dataframe automatically.

searches = Alteryx.read("dataimport")

SUCCESS: reading input data "dataimport"
      RxNorm_Id
0            99
1           161
2           167
3           168
4           197
...         ...
6711    2562541
6712    2565823
6713    2566308
6714    2566416
6715    2571104

I have a for loop that looks through URL's and replaces a segment with the search.

for search in searches:
    print(f"Scraping {search}")
    url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search}/historystatus.json?caller=RxNav"
    print(url)

When I attempt to run the data through the loop it starts with the header name first

Scraping RxNorm_Id https://rxnav.nlm.nih.gov/REST/rxcui/**RxNorm_Id**/historystatus.json?caller=RxNav

I'm not exactly sure why it uses the header first, but obviously it causes an error because the search doesn't exist.

If I try to change the dataframe to a list, it wraps each item in a square bracket. Such as:

Info: Python (2): [[99], [161], [167], [168], [197], [272], [281], [376]]


searches = searches.values.tolist()

Scraping [99]
https://rxnav.nlm.nih.gov/REST/rxcui/[99]/historystatus.json?caller=RxNav

If I hardcode searches as [99,161,167,168,197,272,281,376] my loop works without an issue.

How can I get the initial data frame in that format? Or how can I get the tolist function to not wrap each number in square brackets.

I understand my data source is secure and using Alteryx prevents me from replicating the data source. But, this should be enough information to solve the issue.

Below is my entire code trimmed for easily reproducible:

    from ayx import Alteryx
    from numpy import dtype
    import pandas as pd
    import requests
    
    searches = Alteryx.read("dataimport")
#    searches = searches.values.tolist()
    
    
#    for search in searches: attempt for the tolist() function
    for search in [searches]:
        print(f"Scraping {search}")
        url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search}/historystatus.json?caller=RxNav"

        print(url)
        data = s.get(url,headers=headers).json()      #results from second redirect
        print(data)
        a = data['rxcuiStatusHistory']['definitionalFeatures']
        b = data['rxcuiStatusHistory']['attributes']
        print(b)
        rxcui = b['rxcui']
        name = b['name']
        print(rxcui)
        print(name)
    
        try: 
            baserxcui = a['ingredientAndStrength'][0]['baseRxcui']
            basename = a['ingredientAndStrength'][0]['baseName']
            print(baserxcui)
            print(basename)
        except KeyError:
            baserxcui = rxcui
            basename = name
            print(baserxcui)
            print(basename)
    
        try:  
            bossrxcui = a['ingredientAndStrength'][0]['bossRxcui']
            bossname = a['ingredientAndStrength'][0]['bossName']
            print(bossrxcui)
            print(bossname)
        except KeyError:
            bossrxcui = rxcui
            bossname = name
            print(bossrxcui)
            print(bossname)

CodePudding user response:

You can just take the first item inside search since search is a list. So your code then becomes something like this:

for search in searches:
    print(f"Scraping {search[0]}")
    url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{search[0]}/historystatus.json?caller=RxNav"
    print(url)

Or you can simple just change

searches = searches.values.tolist()

to

searches = [i[0] for i in searches.values.tolist()]
  • Related