Home > OS >  How to build a python DataFrame starting json results with different structures (some in list, some
How to build a python DataFrame starting json results with different structures (some in list, some

Time:10-28

I want to know in which French university libraries 3 books are located (corresponding to 3 urls). I want to get the json data of these 3 urls but all the results are not the same : if a book is in several french libraries, a list is created, but if they are in only 1 library, the results are in a dictionary. This disturbs the dataframe I am trying to obtain.

This is my list of url :

urls
['https://www.sudoc.fr/services/multiwhere/23284545X&format=text/json',
 'https://www.sudoc.fr/services/multiwhere/056068646&format=text/json',
 'https://www.sudoc.fr/services/multiwhere/244974632&format=text/json']

This is my loop :

data=[]
for u in urls:
        req=requests.get(u)
        wb=req.json()["sudoc"]["query"]["result"]["library"]
        data.append(wb)
data2=pd.DataFrame(data).stack().apply(pd.Series)
data2

This is what i get :


        0   latitude    longitude   rcr     shortname
0   0   NaN     48.5871803  7.7551573   674821001   STRASBOURG-BNU
        1   NaN     48.5789749  7.7651191   674822225   STRASBOURG-Orientales
        2   NaN     48.8492618  2.3433311   751052105   PARIS-BIS, Fonds général
        3   NaN     48.8467139  2.3463854   751052116   PARIS-Bib. Sainte Geneviève
        4   NaN     48.8274879  2.3761096   751132108   PARIS-BULAC
1   0   NaN     48.5871803  7.7551573   674821001   STRASBOURG-BNU
        1   NaN     48.8274879  2.3761096   751052201   PARIS-BULAC-IEI J. Darmesteter
        2   NaN     48.846328   2.351046    751055408   PARIS-Bib. Société asiatique
2   0   rcr             NaN     NaN     NaN     NaN
        1   latitude    NaN     NaN     NaN     NaN
        2   shortname   NaN     NaN     NaN     NaN
        3   longitude   NaN     NaN     NaN     NaN

It doesn't work for the last book because json results are not in a list like the two other ones.

Could you help me with that ?

Thanks ! :)

CodePudding user response:

I don't know what the dataframe should exactly look like at the end, but just put the dict into a list and the magic is hopefully done :)

import requests
import pandas as pd

urls = ['https://www.sudoc.fr/services/multiwhere/23284545X&format=text/json',
 'https://www.sudoc.fr/services/multiwhere/056068646&format=text/json',
 'https://www.sudoc.fr/services/multiwhere/244974632&format=text/json']

data = []
for url in urls:
    response = requests.get(url)
    wb = response.json()["sudoc"]["query"]["result"]["library"]
    if type(wb) == dict:
        wb = [wb]
    data.append(wb)

data2 = pd.DataFrame(data).stack().apply(pd.Series)
print(data2)
  • Related