What is window_size in time-series and What are the advantages and disadvantages of having small and-CodePudding

I am quite beginner in machine learning. I have tried a lot to understand this concept but I am unable to understand it on google. I need to understand this concept in easy way.

Please explain this question in easy words and much detail.

CodePudding user response：

"window size" typically refers to the number of time periods that are used to calculate a statistic or model.

Advantages and Disadvantages of various window sizes relate to the balance between:

the sensitivity to changes in the data vs susceptibility to noise & outliers

If you have ever dealt with moving average indicators on the stock market, you will understand that each window size has a purpose, and these different window sizes are often used in combination to get a more holistic view/understanding. eg. MA20 vs MA50 vs MA100. Each of these indicators are using a different window size to calculate the moving average of the stock of interest.

Image Source: Yahoo Finance

CodePudding user response：

This question is best suited for stack exchange as it is not a specific coding question.

Window size is the duration of observations that you ask an algorithm to consider when learning a time series. For example, if you need to predict tomorrow's temperature, and you use a window of 5 days, then the algorithm will divide your entire time series into segments of 6 days (5 training days and 1 prediction days) and try to learn how to use only 5 days of data to predict next 1 day based on the historic records.

Advantage of short window: You get more samples out of the time series so that your estimation of short term effects are more reliable (100 days historic time series will provide you around 95 samples if you are using a 5 day window - so the model is more certain about what the influence of the past 5 days has on next day temperature)

Advantage of long window long windows allow you to better learn seasonal and trend effects (think about events that happen yearly, monthly...etc). If your window is small - say 5 days, your model will not learn any seasonal effect that occurs monthly. However, if your window is 60 days, then every sample of data that you feed to the model would have at least 2 occurrences of the monthly seasonal effect, and this would enable your model to learn such seasonality.

The downside of long window is the number of samples decreases. Assuming an 100 day time series, 60 day window will only yield 40 samples of data. This would mean every parameter of your model is now fitted on much smaller sample of data and may be reduce the reliability of the model.