Home > OS >  How to generate a csv file with pandas fro given start/end dates & interval?
How to generate a csv file with pandas fro given start/end dates & interval?

Time:07-12

I'm very new to coding and stack overflow, so my apologies if my code is clunky. I'm adjusting some code from Tim Supinie (https://github.com/tsupinie/vad-plotter) to run through a given time frame and plot hodographs for these times. I've also created a csv file of params in this loop. I'll include the code that I think is relevant below.

def main():
    
    ap = argparse.ArgumentParser()
    ap.add_argument('radar_id', help="The 4-character identifier for the radar (e.g. KTLX, KFWS, etc.)")
    ap.add_argument('-m', '--storm-motion', dest='storm_motion', help="Storm motion vector. It takes one of two forms. The first is either 'BRM' for the Bunkers right mover vector, or 'BLM' for the Bunkers left mover vector. The second is the form DDD/SS, where DDD is the direction the storm is coming from, and SS is the speed in knots (e.g. 240/25).", default='right-mover')
    ap.add_argument('-s', '--sfc-wind', dest='sfc_wind', help="Surface wind vector. It takes the form DDD/SS, where DDD is the direction the storm is coming from, and SS is the speed in knots (e.g. 240/25).")
    ap.add_argument('-t', '--start-time', dest='start_time', help="Start time to plot. Takes the form DD/HHMM, where DD is the day, HH is the hour, and MM is the minute.")
    ap.add_argument('-e', '--end-time', dest='end_time', help="End time to plot. Takes the form DD/HHMM, where DD is the day, HH is the hour, and MM is the minute.")
    ap.add_argument('-f', '--img-name', dest='img_name', help="Name of the file produced.")
    ap.add_argument('-p', '--local-path', dest='local_path', help="Path to local data. If not given, download from the Internet.")
    ap.add_argument('-c', '--cache-path', dest='cache_path', help="Path to local cache. Data downloaded from the Internet will be cached here.")
    ap.add_argument('-w', '--web-mode', dest='web', action='store_true')
    ap.add_argument('-x', '--fixed-frame', dest='fixed', action='store_true')
    args = ap.parse_args()

    np.seterr(all='ignore')
    
    start_time = args.start_time
    end_time = args.end_time
    loop_time = start_time
    minute = timedelta(minutes=1)
    tmp = pd.DataFrame()
    while loop_time <= end_time:
        try:
            vad_plotter(args.radar_id,
                        storm_motion=args.storm_motion,
                        sfc_wind=args.sfc_wind,
                        time=loop_time,
                        fname=args.img_name,
                        local_path=args.local_path,
                        cache_path=args.cache_path,
                        web=args.web,
                        fixed=args.fixed
                        )
            tmp = tmp.append(params, loop_time)
        except:
            if args.web:
                print(json.dumps({'error':'error'}))
            else:
                print('This time does not exist. Continuing to next time.')
        loop_time_dt = datetime.strptime(loop_time, '%Y-%m-%d/%H%M')
        loop_time_dt  = minute
        loop_time = datetime.strftime(loop_time_dt, '%Y-%m-%d/%H%M')  
    tmp.to_csv('parameters.csv')

I have it working so that I get a csv file that looks something like this (I've shortened it for this example):

    shear_mag_1000m
0   26             
1   32              
2   29              
3   27              

But I would like to have a time column that has each corresponding successful time so it looks more like this:

time   shear_mag_1000m   
2100   26             
2200   32             
2300   29             
2400   27             

I think the times would be the loop_time, but I don't know how to only have the successful loop times included (For example, I'd have a start time of 2100 and an end time of 2150 with an increment of 1 minute. However, there might only be data available at 2100, 2124, and 2148. These are currently the only times the hodographs are plotted for and the parameters are added to the csv file). Any help to add the time column is appreciated!

CodePudding user response:

First, you need to filter the data that only have loop_time value.

Then you can use set_index from Pandas

You can set the loop_time column as the index before saving the output to CSV

temp.set_index('loop_time')

CodePudding user response:

TIP: Always avoid appending into dataframes (besides, append() is deprecated) - prefer appending into lists/dictionaries instead, data structures that are meant to "grow".

Use the pandas.date_range() to create the equally spaced index between those dates requested, like this (make sure your date-formats are recognized):

import pandas as pd

## Request 1 minute intervals, for more `freq` values
#  see https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases
tindex = pd.date_range(start_time, end_time, freq="1T")

...and then feed this into a function requesting the NEXRAD VAD for each date. But since vad_plotter() may throw occasional exceptions (eg "data for time(..) do not exist", wrap the call into a new function handling the exception:

def fetch_nexrad_vad(timestamp, ..., errors: list):
    try:
       return vad_plotter(time=time, ...)
    except Exception as ex:
        errors.append(f" {timestamp}: failed getting VAD from NEXRAD due to: {ex}")

A sane step is to use a list-comprehension to create the records for each timestamp in a list-of-lists, and then build the dataframe:

errors = []
df_records = [fetch_nexrad_vad(..., time, ...) for time in tindex]
df = pd.from_records(df_records)
df = df.set_index(tindex)

...but experienced pandas programmers know faster/terser ways to build the dataframe from the index function in one step.

NOTE: using the error list to collect invalid times makes it pretty simple to convert it into JSON. Or even better collect just the invalid timestamps, if thrown exception is always the same.

  • Related