Home > Software design >  Create new column with multiple values in Python
Create new column with multiple values in Python

Time:08-02

I have a dataframe, which has name of Stations and Links of Measured value of each Station for 2 days

    Station      Link
0   EITZE        https://www.pegelonline.wsv.de/webservices/rest-api/v2/stations/EITZE/W/measurements.json?start=P2D
1   RETHEM       https://www.pegelonline.wsv.de/webservices/rest-api/v2/stations/RETHEM/W/measurements.json?start=P2D
.......
685 BORGFELD     https://www.pegelonline.wsv.de/webservices/rest-api/v2/stations/BORGFELD/W/measurements.json?start=P2D

To take data from json isn't a big problem. But then I realized, that json-link from each station has multiple values from different time, so I don't know how to add these values from each time to a specific station. I tried to get all the values from json, but I can't define, which values from which station, because it's just too many.
Anyone have a solution for me?
The Dataframe i would like to have, should look like this!

        Station     Timestamp                    Value
0       EITZE       2022-07-31T00:30:00 02:00    15
1       EITZE       2022-07-31T00:45:00 02:00    15
.......
100     RETHEM      2022-07-31T00:30:00 02:00    15
101     RETHEM      2022-07-31T00:45:00 02:00    20
.......
xxxx    BORGFELD    2022-08-02T00:32:00 02:00    608

CodePudding user response:

Starting with this example data frame:

  Station                                               Link
0   EITZE  https://www.pegelonline.wsv.de/webservices/res...
1  RETHEM  https://www.pegelonline.wsv.de/webservices/res...

You could leverage apply to populate an accumulation data frame.

import requests
import json

Define the function to be used by apply

def get_link(x):
    global accum_df 
    
    r = requests.get(x['Link'])
    
    if r.status_code == 200:
        ldf = pd.DataFrame(json.loads(r.text))
        ldf['station'] = x['Station']
        accum_df = pd.concat([accum_df,ldf])
    
    else:
        print(r.status_code)    # handle the error
    
    return None

Apply it

accum_df = pd.DataFrame()
df.apply(get_link, axis=1)
print(accum_df)

Result

                     timestamp  value station
0    2022-07-31T02:00:00 02:00  220.0   EITZE
1    2022-07-31T02:15:00 02:00  220.0   EITZE
2    2022-07-31T02:30:00 02:00  220.0   EITZE
3    2022-07-31T02:45:00 02:00  220.0   EITZE
4    2022-07-31T03:00:00 02:00  219.0   EITZE
..                         ...    ...     ...
181  2022-08-02T00:00:00 02:00   23.0  RETHEM
182  2022-08-02T00:15:00 02:00   23.0  RETHEM
183  2022-08-02T00:30:00 02:00   23.0  RETHEM
184  2022-08-02T00:45:00 02:00   23.0  RETHEM
185  2022-08-02T01:00:00 02:00   23.0  RETHEM
  • Related