What function do I use to loop through URL parameters and display it in a Pandas dataframe?-CodePudding

I've been doing some API requests using Requests and Pandas. Now I'm trying to use a for loop to iterate through a list of URL parameters. When I test using print(), I get the JSON response for the entire list back. What I really want to do is turn the response into a Pandas dataframe but I don't know what function I can use to do that.

import requests
import requests_cache
from requests_cache import CachedSession
import pandas as pd

session = CachedSession()

base_url = "https://api.crossref.org/works/"
for doi in ["10.4324/9780429202483", "10.1177/2053168017702990", "10.1016/j.chb.2019.05.017", "10810730.2017.1421730," "10.1002/wmh3.247", "10.1177/1940161220919082"]:
  url = base_url   str(doi)
  response = session.get(url, headers={"mailto":"[email protected]"})
  data = response.json()['message']
  dataframe = pd.json_normalize(data)
  dataframe.head(6)

Basically, I'm trying to create a dataframe like the one below, but with six lines, one for each of the parameters. Dataframe

CodePudding user response：

If I understand you right, you want to create dataframe with number of rows equal to number of parameters.

You can put max_level=1 to pd.json_normalize() function to create dataframe with only one row and then concatenate the six dataframes to one with pd.concat:

import requests
import pandas as pd

base_url = "https://api.crossref.org/works/"
lst = [
    "10.4324/9780429202483",
    "10.1177/2053168017702990",
    "10.1016/j.chb.2019.05.017",
    "10810730.2017.1421730," "10.1002/wmh3.247",
    "10.1177/1940161220919082",
]

dfs = []
with requests.session() as session:
    for doi in lst:
        url = base_url   str(doi)
        response = session.get(url, headers={"mailto": "[email protected]"})
        data = response.json()["message"]
        dataframe = pd.json_normalize(data, max_level=1)
        dfs.append(dataframe)

df = pd.concat(dfs, ignore_index=True)
print(df[["reference-count", "publisher", "isbn-type"]]) # <-- print only few columns for brevity

Prints:

   reference-count          publisher                                           isbn-type
0                0          Routledge  [{'value': '9780429202483', 'type': 'electronic'}]
1               31  SAGE Publications                                                 NaN
2               89        Elsevier BV                                                 NaN
3               27              Wiley                                                 NaN
4               54  SAGE Publications                                                 NaN