I am new to programming and am attempting to create a prediction model for multiple articles. Unfortunately, using Excel or similar software is not possible for this task. Therefore, I have installed Rstudio to solve this problem. My goal is to make a 18-month prediction for each article in my dataset using an ARIMA model.
However, I am currently facing an issue with the format of my data frame. Specifically, I am unsure of how my CSV should be structured to be read by my code.
I have attached an image of my current dataset in CSV format : https://i.stack.imgur.com/AQJx1.png
Here is my dput(sales_data) :
structure(list(X.Article.1.Article.2.Article.3 = c("janv-19;42;49;55", "f\xe9vr-19;56;58;38", "mars-19;55;59;76")), class = "data.frame", row.names = c(NA, -3L))
And also provided the code I have constructed so far with the help of blogs and websites :
library(forecast)
library(reshape2)
sales_data <- read.csv("sales_data.csv", header = TRUE)
sales_data_long <- reshape2::melt(sales_data, id.vars = "Code Article")
for(i in 1:nrow(sales_data_long)) {
sales_data_article <- subset(sales_data_long, sales_data_long$`Code Article` == sales_data_long[i,"Code Article"])
sales_ts <- ts(sales_data_article$value, start = c(2010,6), frequency = 12)
arima_fit <- auto
arima_forecast <- forecast(arima_fit, h = 18)
print(arima_forecast)
print("Article: ", Code article[i])
}
With this code, RStudio gives me the following error : "Error: id variables not found in data: Code Article"
Currently, I am not interested in generating any plots or outputs. My main focus is on identifying the appropriate format for my data.
Do I need to modify my CSV file and separate each column using "," or ";"? Or, can I keep my data in its current format and make adjustments in the code instead?
CodePudding user response:
Added the dput output as per jrcalabrese request. Swapped to the replacement for reshape2 (tidyr). Used pivot_longer. Now doesn't give error, which was happening in reshape2::melt. It doesn't matter so much what the csv structure is. Your structure was fine. Hope this helps! :-)
library(tidyr)
sales_data <- structure(list(var1 = c("Article 1", "Article 2", "Article 3"),
`janv-19` = c(42, 56, 55),
`fev-19` = c(49, 58, 59),
`mars-19` = c(55, 38, 76)),
row.names = c(NA, 3L), class = "data.frame")
sales_data_long <- sales_data |> pivot_longer(!var1,
names_to = "month",
values_to = "count")
CodePudding user response:
I modified my code and finally managed to get something. Here is my code :
library(forecast)
library(readr)
sales_data <- read_csv("sales_data.csv")
sales_ts_list <- list()
for(i in 1:ncol(sales_data)){
sales_ts_list[[i]] <- ts(sales_data[,i], start = c(2019, 1), frequency = 12)
}
sales_arima_list <- list()
for(i in 1:length(sales_ts_list)){
sales_arima_list[[i]] <- auto.arima(sales_ts_list[[i]])
}
sales_forecast_list <- list()
for(i in 1:length(sales_arima_list)){
sales_forecast_list[[i]] <- forecast(sales_arima_list[[i]], h = 18)
}
for(i in 1:length(sales_forecast_list)){
print(sales_forecast_list[[i]])
}
for(i in 1:length(sales_forecast_list)){
plot(sales_forecast_list[[i]])
mtext(colnames(sales_data)[i], side = 3, line = 0, cex = 1.5)
}
This seems to work because I do get a graph and a list. On the other hand, I only get one graph and not the graphs of all the articles. I think it's because I have too many items in my file. Is it possible to call a particular graph? For example: "I want to display the graph of item n°87"?