How to apply machine learning to a csv file to predict future values-CodePudding

I'm curious about ML and I wonder if some of you guys could help me getting started.

I have a dataset in a csv format like this:

Date	First	Second	Third
2022-12-30	5402	8694	8648
2022-12-29	3804	8529	6690
2022-12-28	3192	2779	2166

I want to predict first, second, and third values in the future time e.g. 2022-12-31.

What kind of algorithm is suitable to do this job? How do I have to implement this in my Jupyter notebook? Any example and/or reference of this problem will be so helpful to me. This is for predicting a 4-digit lottery game.

I have let panda to read my csv file and set it to a variable named "dataset"

import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

dataset=pd.read_csv("C:/Users/Administrator/Desktop/data.csv")

dataset['Date'] = pd.to_datetime(dataset.Date)

CodePudding user response：

One popular method for time series forecasting is the ARIMA (AutoRegressive Integrated Moving Average) model. You can use the statsmodels library in Python to implement an ARIMA model in your Jupyter notebook.

Here is an example of how you can use the statsmodels library to fit an ARIMA model to your time series data and make predictions:

import pandas as pd
import statsmodels.api as sm

# Load the DataFrame
df = pd.read_csv("data.csv")

# Set the Date column as the index
df.set_index('Date', inplace=True)

# Fit the ARIMA model
model = sm.tsa.ARIMA(df, order=(1,1,1)).fit()

# Make predictions
predictions = model.predict(start='2022-12-31', end='2022-12-31', dynamic=True)
print(predictions)

This code will fit an ARIMA model to your time series data and make a prediction for the values of the "First", "Second", and "Third" columns

You can find more information about time series forecasting and the ARIMA model in the statsmodels documentation

CodePudding user response：

Here you are predicting the trend of the random winning number so linear regression would be the ideal choice for this.