Iterations are getting overwritten-CodePudding

Hope whoever is reading this is well.

What I am trying to do is extract a table of data from the NIST ILThermo website on viscosity of pure (single component) Ionic liquids and conditions it was measured at. I am using this code, by a user called HedgeHog, however it overwrites itself not showing all the different temperatures and their viscosities. Instead it shows the last temperature and viscosity across the entire table.

Here is the code:

import requests
import pandas as pd

prop = 'jVUM'
url = f'https://ilthermo.boulder.nist.gov/ILT2/ilsearch?cmp=&ncmp=1&year=&auth=&keyw=&prp={prop}'

ref_data = requests.get(url).json()
#This line makes an HTTP GET request to the API endpoint specified by the url variable. 
#The response is converted to a JSON object and stored in the ref_data variable.

data = []
# This line initializes an empty list data that will be used to store the final processed data.

for e in ref_data['res'][:1]:
#This line starts a for loop that will iterate through the elements (remove 1 for all of them to be iterated) of the 
#res key of the ref_data JSON object. The variable e will hold the value of each iteration.

    d = dict(zip(ref_data['header'],e))
#This line creates a dictionary d by zipping the header key of the ref_data JSON object with the value of each 
#iteration e using the zip function.

    set_data = requests.get(f"https://ilthermo.boulder.nist.gov/ILT2/ilset?set={d['setid']}").json()
#This line makes another HTTP GET request to retrieve additional data for each setid. The response is converted 
#to a JSON object and stored in the set_data variable. The setid is passed as a query parameter in the URL.

    header = [item for items in set_data['dhead'] for item in items if item and item != 'Liquid']
    header.append('Liquid')
#This line creates a header by flattening the dhead key of the set_data JSON object and appending the string 'Liquid' to it.

    for x in [[item for items in sublist for item in items] for sublist in set_data['data']]:
#This line starts another for loop that will iterate through the data in the data key of the set_data JSON object. 
#The variable x will hold the value of each iteration.

        d.update(
            dict(
                zip(header, x)
            )
        )
        data.append(d)
#This line updates the d dictionary by zipping the header with the value of each iteration x using the

pd.DataFrame(data)

And here is the output

    setid   ref prp phases  cmp1    cmp2    cmp3    np Visc nm1 Temperature, K  Pressure, kPa   Viscosity, Pa&#8226;s   Liquid
0   dQYEM   Safarov et al. (2021b)  Viscosity   Liquid  AAiERH  None    None    500 1-ethyl-3-methylimidazolium dicyanamide 453.193 101.325 0.00744 0.00018
1   dQYEM   Safarov et al. (2021b)  Viscosity   Liquid  AAiERH  None    None    500 1-ethyl-3-methylimidazolium dicyanamide 453.193 101.325 0.00744 0.00018
2   dQYEM   Safarov et al. (2021b)  Viscosity   Liquid  AAiERH  None    None    500 1-ethyl-3-methylimidazolium dicyanamide 453.193 101.325 0.00744 0.00018
3   dQYEM   Safarov et al. (2021b)  Viscosity   Liquid  AAiERH  None    None    500 1-ethyl-3-methylimidazolium dicyanamide 453.193 101.325 0.00744 0.00018
4   dQYEM   Safarov et al. (2021b)  Viscosity   Liquid  AAiERH  None    None    500 1-ethyl-3-methylimidazolium dicyanamide 453.193 101.325 0.00744 0.00018
... ... ... ... ... ... ... ... ... ... ... ... ... ...
495 dQYEM   Safarov et al. (2021b)  Viscosity   Liquid  AAiERH  None    None    500 1-ethyl-3-methylimidazolium dicyanamide 453.193 101.325 0.00744 0.00018
496 dQYEM   Safarov et al. (2021b)  Viscosity   Liquid  AAiERH  None    None    500 1-ethyl-3-methylimidazolium dicyanamide 453.193 101.325 0.00744 0.00018
497 dQYEM   Safarov et al. (2021b)  Viscosity   Liquid  AAiERH  None    None    500 1-ethyl-3-methylimidazolium dicyanamide 453.193 101.325 0.00744 0.00018
498 dQYEM   Safarov et al. (2021b)  Viscosity   Liquid  AAiERH  None    None    500 1-ethyl-3-methylimidazolium dicyanamide 453.193 101.325 0.00744 0.00018
499 dQYEM   Safarov et al. (2021b)  Viscosity   Liquid  AAiERH  None    None    500 1-ethyl-3-methylimidazolium dicyanamide 453.193 101.325 0.00744 0.00018
500 rows × 13 columns

It is not very clear, however as you can see, the temperature 454.193 is the temperature at the final value of the table I am extracting data from. This is shown throughout the table, showing the final viscosity conditions multiple times rather than a variation.

I think the error is in the x loop, but I can't seem to figure it out. If anyone has any advice that would be great.

Thanks :)

CodePudding user response：

Every row in your observations (the result of your second api request) need to have data appended to a list. The way I would look to do that is via setdefault()

import requests
import pandas
import json

prop = 'jVUM'
url = f'https://ilthermo.boulder.nist.gov/ILT2/ilsearch?cmp=&ncmp=1&year=&auth=&keyw=&prp={prop}'
ref_data = requests.get(url).json()
ref_headers = ref_data['header']
ref_rows = ref_data['res'][:5]   ## test with 5 rows max
ref_data = [dict(zip(ref_headers, row)) for row in ref_rows]
for ref_row in ref_data:
    set_data = requests.get(f"https://ilthermo.boulder.nist.gov/ILT2/ilset?set={ref_row['setid']}").json()
    set_headers = [
        item
        for rows in set_data['dhead']
        for item in rows
        if item
    ]
    set_rows = [
        [
            cell
            for item in rows
            for cell in item
            if cell
        ]
        for rows in set_data['data'][:5]   ## test with 5 rows max
    ]
    set_data = [zip(set_headers, row) for row in set_rows]
    for observation in set_data:
        for key, value in observation:
            ref_row.setdefault(key, []).append(value)

df = pandas.DataFrame(ref_data)
print(df)

that gives me:

   setid                     ref        prp  ...                                  Pressure, kPa                     Viscosity, Pa&#8226;s                                    Liquid
0  dQYEM  Safarov et al. (2021b)  Viscosity  ...  [101.325, 101.325, 101.325, 101.325, 101.325]  [0.0627, 0.0625, 0.0605, 0.0604, 0.0635]  [0.0014, 0.0014, 0.0014, 0.0014, 0.0014]
1  oyqwN  Safarov et al. (2018c)  Viscosity  ...  [101.325, 101.325, 101.325, 101.325, 101.325]         [8.278, 7.97, 7.82, 7.509, 6.771]            [0.88, 0.83, 0.81, 0.78, 0.69]
2  jOYXL  Safarov et al. (2017a)  Viscosity  ...                      [100, 100, 100, 100, 100]       [0.797, 0.784, 0.745, 0.736, 0.688]        [0.045, 0.044, 0.041, 0.04, 0.037]
3  dRKZY  Sequeira et al. (2020)  Viscosity  ...                      [210, 210, 210, 210, 210]  [0.0432, 0.0433, 0.0433, 0.0434, 0.0434]  [0.0018, 0.0018, 0.0018, 0.0018, 0.0018]
4  rpzLs   Safarov et al. (2022)  Viscosity  ...  [101.325, 101.325, 101.325, 101.325, 101.325]  [0.1108, 0.1076, 0.1111, 0.1087, 0.0999]   [0.009, 0.0084, 0.0091, 0.0086, 0.0071]

Which is what I hope you are after.