Home > database >  Approach to storing forecast time series data using python
Approach to storing forecast time series data using python

Time:07-05

So I want to scrape a weather forecast table once a day and store my results for future analysis. I want to store the data but im not sure how to.

Example of the data: Forecast Table

My four variables of interest are wind speed, wind gusts, wave height, and wave period.

This is my first python project involving time series data and I’m fairly new to databases so go easy on me and ELI5 please.

In the Python For Everyone course I recently took, I learned about relational databases and using SQLlite. The main idea here was basically to be efficient in storing data and never storing the same data twice. However none of the examples involved time series data. And so now I’m not sure what the best approach here is.

If I create a table for each variable and lastly one for the date I scraped the forecast. The date of scraping would then serve as the primary key. In this example the variables such as windspeed's first column would be date of scraping followed by the next columns being the forecasted values for the time stamps. Although this would make the storage more efficient as opposed to creating a new table every day, there are a few problems. The timestamps are not uniform (see image, forecast times are only from 3am to 9pm). Also depending on the time of day that the forecast is scraped the date and time values on the timestamps are always changing and thus the next timestamp is not always in 2 hours.

Seeing as each time I scrape the forecast, I get a new table, should I create a new database table each time in sqlite? This seems like a rather rudimentary solution, and I’m sure there are better ways to store the data.

How would you go about this?

CodePudding user response:

You should consider using the pandas module for storing your data. Pandas manages data as a table, called a dataframe, and has many methods to operate on the data. In addition, you can easily append new data to the pandas dataframe. Dataframes can be stored in several formats including CSV. Information on pandas is available here: https://pandas.pydata.org/docs/getting_started/index.html#getting-started

CodePudding user response:

Summarizing my comments:

You may want to append forecast data from a new scrapping to the existing data in the same database table.

From each new web-scrapping you will get approx. 40 new records with the same scrapping time stamp but different forecast time stamp.

e.g., this would be the columns of the table using ID as primary key with AUTOINCREMENT:

ID Scrapping_time Forecast_hours Wind_speed Wind_gusts Wind_direction Wave Wave_period wave_direction

Note: if you use SQLite, you could leave out the ID column as SQLite would add such ROWID column by default if no other primary key had been specified (https://www.sqlite.org/autoinc.html)

  • Related