I am new to R and I am trying to play around with the data from here. I try to oversampling it but the Error in model.frame.default
happen.
- The first trial
oversample_data <- ovun.sample(class ~ ., data = sample_dataset, p = 0.5, seed = 1, method="over")$data
But
Error in model.frame.default(formula = class ~ step type amount : object is not a matrix
is shown.
- That's why I come up with second trial and turn the dataset into matrix first and then do oversampling
org_dataset <- as.matrix(org_dataset[complete.cases(org_dataset), ])
data_balanced_over <- ovun.sample(class ~ ., data = org_dataset, p = 0.5, seed = 1, method = "over")$data
But it says
Error in model.frame.default(formula = class ~ step type amount : 'data' must be a data.frame, not a matrix or an array
It makes me so confused... What is the right way to do oversampling?
CodePudding user response:
The problem is the formula you're setting for ovun.sample
. There is no variable named class
in the dataset you're referring to.
The documentation of the ROSE package for the formula says that
The left- hand-side (response) should be a vector specifying the class labels. The right- hand-side should be a series of vectors with the predictors.
Thus, you'll have to specify a variable holding the class labels. Given the dataset, I assume this would be isFraud
. The call, then would be
oversample_data <- ovun.sample(isFraud ~ ., data = sample_dataset, p = 0.5, seed = 1, method="over")$data