Home > OS >  Looping through a data range to download data from an API
Looping through a data range to download data from an API

Time:10-26

I am downloading data from an API, the maximum look back period is 833 days, but from my testing I know they have data going back to 2002. I have a function below that defines 833 days from today into the two date times "end" and "start", these are fed into an API command. Note they need to be in a string format and formatted in that way for the api to accept them.

d=datetime.today()
end = str(d.year)   "-"   str(d.month)   "-"   str(d.day)

lookbook_period = 833

# Take current date and minus the the max time delta of 833 days to get 'start' var
time_delta = timedelta(days=lookbook_period)
now = datetime.now()

#split the answer and format it to the required timestamp type.
start = str(now - time_delta).split(" ")[0]

What I want to do is download the dataframes in 833 day sections and then piece them all together into a CSV or dataframe. I have below what I have so far but I am not sure how to make a function that changes the dates as it goes.


def time_machine():
    df_total = pd.DataFrame
    
    start_str = str(2002)   "-0"   str(5)   "-0"   str(1)
    start = datetime(2002,5,1)
    print(start)
    
    # amount of days from 2002-05-01 to now
    rolling_td = timedelta(days=int(str((datetime.today() - start)).split(" ")[0]))
    print(rolling_td, "\n")
    
    # API maximum amount of lookbook days
    max_td = timedelta(days=833)
    
    # The function would do something similar to this, and on each pass, calling the API and saving the data to a dataframe or CSV.


    s1 = start   max_td
    print(s1)
    s2 = s1   max_td
    print(s2)
    s3 = s2   max_td
    print(s3)
    
    d=datetime.today()
    end = str(d.year)   "-"   str(d.month)   "-"   str(d.day)
    print(d)

Any suggestions or tools/libraries to look at would be greatly appreciated. I have been testing things with while-loops but I am still running blindly into the dark on this one.

This a rough sudo code of what I think I need, but Im still unsure on how to move onto the next section

while count > 0 and > 833:
        start = 
        end =
        
        
    call the API first to download first set of data. 
    Check date range:
        get most recent date   833 days to it 
            Download next section
                repeat
    
    if count < 833: 
            calulate requied dates for start and end 

CodePudding user response:

If you first define your date ranges, you will be able to iterate through each 833 day period to pull data using the API. You'll then need to append the data to the dataframe (or csv) for each iteration.

import datetime as dt

# Date range to pull data over
start_date = dt.date(2002,5,1)
end_date = dt.date.today()
delta = dt.timedelta(days=832) # 832 so you have a range of 833 days inclusive

# Iterating from start date, recording date ranges of 833 days
date_ranges = []
temp_start_date = start_date
while temp_start_date < end_date:
    temp_end_date = temp_start_date   delta 
    if temp_end_date > end_date:
        temp_end_date = end_date
    date_ranges.append([temp_start_date, temp_end_date])
    temp_start_date = temp_end_date   dt.timedelta(days=1)

# For each date range, pass dates into API
# Initialise dataframe here
for start_date, end_date in date_ranges:
    start_date_str = start_date.strftime("%Y-%m-%d")
    end_date_str = end_date.strftime("%Y-%m-%d")

    # Input to API with start and end dates in correct string format

    # Process data into dataframe

There should be no need to count 833 days, as you said the API takes the start and end dates as arguments, hence you just need to find those for each date range.

CodePudding user response:

You can use pandas date_range to get your dates every 833 days and just iterate through. Also, your date format looks to be isoformat.

Put your code inside the for loop to get the data into the dataframe, then output to csv.

dates_list = pd.date_range(start=datetime.now(), periods=10, freq='-833D')

for i in range(1, len(dates_list)):
    # print(dates_list[i], dates_list[i-1])
    start = dates_list[i].date().isoformat()
    end = dates_list[i-1].date().isoformat()
    print(start, end)

2019-07-15 2021-10-25
2017-04-03 2019-07-15
2014-12-22 2017-04-03
2012-09-10 2014-12-22
2010-05-31 2012-09-10
2008-02-18 2010-05-31
2005-11-07 2008-02-18
2003-07-28 2005-11-07
2001-04-16 2003-07-28
  • Related