I got data with four variables:
1st column: time series data
2nd & 3rd column: categorical variable (let's say X and Y)
4th column: numerical variable (the value to be predicted).
You can assume X is the direction (E, S, W, N) and Y is the province name.
For example:
|---- date ----| categorical variable 1 | categorical variable 2 | target value |
| 2021/10/01 | X1 | Y1 | 1.02 |
| 2021/10/01 | X1 | Y2 | 0.59 |
| 2021/10/01 | X1 | Y4 | 2.11 |
| 2021/10/01 | X2 | Y1 | 0.68 |
| ...
| 2021/10/01 | X4 | Y8 | 2.68 |
| ...
| 2021/10/30 | X4 | Y5 | 1.00 |
| 2021/10/30 | X4 | Y7 | 0.98 |
| 2021/10/30 | X4 | Y8 | 1.66 |
I need to predict the target value in the coming period (i.e. target values in first week in Nov. 2021).
|---- date ----| categorical variable 1 | categorical variable 2 | target value |
| 2021/11/01 | X1 | Y1 | ??? |
| 2021/11/01 | X1 | Y2 | ??? |
| ...
| 2021/11/07 | X1 | Y4 | ??? |
I was trying to apply LSTM, but it seems not a perfect match. Are there any other better solution/feasible model anyone could share? Or any ideas?
Thanks a lot!
CodePudding user response:
Your last column (the one to be predicted) is your time series data, the first column is just a time index. You can try the SARIMAX or VectorAutoregression since these two are very common algorithms with multiple examples/tutorials on the internet. You have to remember that you may need to convert categorical data into numeric data and your data needs to be stationary.
In time series forecasting, you usually have a series of equally distanced observations. I do not know your data but if (for example) you have an observation for every province every day with varying direction, then you should split up the time series into multiple series for each province. But to be honest, this looks more like a regression problem.
CodePudding user response:
In this case, I will use time as a variable, and separate it into other variables and see if it correlates with my target variable. You can see if your date is a weekend or not, whether it is night or day, a holiday, or a special event that you think affects your target variable. I have done a project using that, I will be happy to share it with you.