I’m new to Julia and i am trying to implement One-Vs-Rest Multi-Class Classification, and I was wondering if anyone could help me out. Here is a snippet of my code so far: My data frame is basic since I’m trying to figure out the implementation first, my c column is my class consisting of [0, 1, 2], and my y, x1, x2, x3 are random Int64 values.
using DataFrames
using CSV
using StatsBase
using StatsModels
using Statistics
using Plots, StatsPlots
using GLM
using Lathe
df = DataFrame(CSV.File(“data.csv”))
fm = @formula(c~x1 x2 x3 y)
model0 = glm(fm0, df, Binomial(), ProbitLink()) # 0 vs [1,2]
model1 = glm(fm1, df, Binomial(), ProbitLink()) # 1 vs [0,2]
model2 = glm(fm2, df, Binomial(), ProbitLink()) # 2 vs [0,1]
I am trying to make logistic models but I don’t know how to do it. If anyone can help me out, I would be thrilled.
I am trying to split the multi-class dataset into multiple binary classification problems. A binary classifier is then trained on each binary classification problem and predictions are made using the model that is the most confident. My only problem is that I don't how to write the logistic model for a multi-class dataset.
CodePudding user response:
Here is how you can do the same manually using GLM.jl (there is a lot of boilerplate code, but I wanted to keep the example simple):
df = DataFrame(x1=rand(100), x2=rand(100), x3=rand(100), target=rand([0, 1, 2], 100));
model0 = glm(@formula((target==0)~x1 x2 x3), df, Binomial(), ProbitLink())
model1 = glm(@formula((target==1)~x1 x2 x3), df, Binomial(), ProbitLink())
model2 = glm(@formula((target==2)~x1 x2 x3), df, Binomial(), ProbitLink())
choice = argmax.(eachrow([predict(model0) predict(model1) predict(model2)])) .- 1 # need to subtract 1 to use 0-based indexing
CodePudding user response:
Personally, I choose the Julia implementation for it. So Bogumił Kamiński's answer would be superior to mine.
I don't know if any packages provide a multi-target/label Logistic Regression model implemented fully in Julia (I would like to know if there are any, I'll prepend them to this answer). But, you can apply the model using ScikitLearn.jl
which is a wrapper for Scikit-learn in Python and uses a connected python session to run the code. You can get further information in their repository. But, I created synthetic data as similar as I could to what you have to train the model on it and show you how you can do it:
#- Packages
using DataFrames
using ScikitLearn
@sk_import linear_model: LogisticRegression
#- Syntethetic data
df = DataFrame(
x1=rand(100),
x2=rand(100),
x3=rand(100),
target=rand([0, 1, 2], 100)
)
# 100×4 DataFrame
# Row │ x1 x2 x3 target
# │ Float64 Float64 Float64 Int64
# ─────┼──────────────────────────────────────────
# 1 │ 0.607024 0.6818 0.562058 0
# 2 │ 0.235538 0.974469 0.553292 1
# ⋮ │ ⋮ ⋮ ⋮ ⋮
# 99 │ 0.382491 0.224192 0.122515 1
# 100 │ 0.617425 0.793276 0.228549 0
#- Split data
train, test = df[1:80, :], df[81:end, :]
Then I train the LogisticRegresssion
(which inherently runs the same object in the sklearn.py):
#- Train model
model = LogisticRegression(multi_)
fit!(model, Matrix(train[:, 1:3]), train[:, 4])
And the last phase would be the prediction:
#- Predict
preds = predict(model, Matrix(test[:, 1:3]));
#- Count right predictions
sum(preds .== test[:, 4])
# returns `6` in my case
Note that you need to install PyCall.jl
to use ScikitLearn.jl
. Make sure to follow the instructions provided by the PyCall.jl
to set up the required environment first.