Prediction on time series analysis using ARIMA in R-CodePudding

I am new to programming and am attempting to create a prediction model for multiple articles. Unfortunately, using Excel or similar software is not possible for this task. Therefore, I have installed Rstudio to solve this problem. My goal is to make a 18-month prediction for each article in my dataset using an ARIMA model.

However, I am currently facing an issue with the format of my data frame. Specifically, I am unsure of how my CSV should be structured to be read by my code.

I have attached an image of my current dataset in CSV format : https://i.stack.imgur.com/AQJx1.png

Here is my dput(sales_data) : structure(list(X.Article.1.Article.2.Article.3 = c("janv-19;42;49;55", "f\xe9vr-19;56;58;38", "mars-19;55;59;76")), class = "data.frame", row.names = c(NA, -3L))

And also provided the code I have constructed so far with the help of blogs and websites :

library(forecast)
library(reshape2)

sales_data <- read.csv("sales_data.csv", header = TRUE)

sales_data_long <- reshape2::melt(sales_data, id.vars = "Code Article")

for(i in 1:nrow(sales_data_long)) {
  
  sales_data_article <- subset(sales_data_long, sales_data_long$`Code Article` == sales_data_long[i,"Code Article"])
  
  sales_ts <- ts(sales_data_article$value, start = c(2010,6), frequency = 12)
  
  arima_fit <- auto

  arima_forecast <- forecast(arima_fit, h = 18)
  
  print(arima_forecast)
  print("Article: ", Code article[i])
}

With this code, RStudio gives me the following error : "Error: id variables not found in data: Code Article"

Currently, I am not interested in generating any plots or outputs. My main focus is on identifying the appropriate format for my data.

Do I need to modify my CSV file and separate each column using "," or ";"? Or, can I keep my data in its current format and make adjustments in the code instead?

CodePudding user response：

Added the dput output as per jrcalabrese request. Swapped to the replacement for reshape2 (tidyr). Used pivot_longer. Now doesn't give error, which was happening in reshape2::melt. It doesn't matter so much what the csv structure is. Your structure was fine. Hope this helps! :-)

library(tidyr)
sales_data <- structure(list(var1 = c("Article 1", "Article 2", "Article 3"),
`janv-19` = c(42, 56, 55),
`fev-19` = c(49, 58, 59),
`mars-19` = c(55, 38, 76)),
row.names = c(NA, 3L), class = "data.frame")

sales_data_long <- sales_data |> pivot_longer(!var1,
                                              names_to = "month",
                                              values_to = "count")

CodePudding user response：

I modified my code and finally managed to get something. Here is my code :

library(forecast)
library(readr)

sales_data <- read_csv("sales_data.csv")

sales_ts_list <- list()
for(i in 1:ncol(sales_data)){
sales_ts_list[[i]] <- ts(sales_data[,i], start = c(2019, 1), frequency = 12)
}

sales_arima_list <- list()
for(i in 1:length(sales_ts_list)){
sales_arima_list[[i]] <- auto.arima(sales_ts_list[[i]])
}

sales_forecast_list <- list()
for(i in 1:length(sales_arima_list)){
sales_forecast_list[[i]] <- forecast(sales_arima_list[[i]], h = 18)
}

for(i in 1:length(sales_forecast_list)){
print(sales_forecast_list[[i]])
}

for(i in 1:length(sales_forecast_list)){
plot(sales_forecast_list[[i]])
mtext(colnames(sales_data)[i], side = 3, line = 0, cex = 1.5)
}

This seems to work because I do get a graph and a list. On the other hand, I only get one graph and not the graphs of all the articles. I think it's because I have too many items in my file. Is it possible to call a particular graph? For example: "I want to display the graph of item n°87"?