I am learning R with the Fraud Transaction data. When I try to use ROSE to handle the imbalanced dataset, the only handle continuous and categorical variables
error pops up.
Here's what I tried:
str(dataset)
'data.frame': 6362620 obs. of 13 variables:
$ step : int 1 1 1 1 1 1 1 1 1 1 ...
$ type : chr "PAYMENT" "PAYMENT" "TRANSFER" "CASH_OUT" ...
$ amount : num 9840 1864 181 181 11668 ...
$ nameOrig : chr "C1231006815" "C1666544295" "C1305486145" "C840083671" ...
$ oldbalanceOrg : num 170136 21249 181 181 41554 ...
$ newbalanceOrig : num 160296 19385 0 0 29886 ...
$ nameDest : chr "M1979787155" "M2044282225" "C553264065" "C38997010" ...
$ oldbalanceDest : num 0 0 0 21182 0 ...
$ newbalanceDest : num 0 0 0 0 0 ...
$ isFraud : int 0 0 1 1 0 0 0 0 0 0 ...
$ isFlaggedFraud : int 0 0 0 0 0 0 0 0 0 0 ...
$ balancedOfOrigin: num -9840 -1864 -181 -181 -11668 ...
$ balancedOfDest : num 0 0 0 21182 0 ...
datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data
With Error:
Error in rose.sampl(n, N, p, ind.majo, majoY, ind.mino, minoY, y, classy, : The current implementation of ROSE handles only continuous and categorical variables.
Debugging:
# change the isFraud attribute into category 0/1
dataset$isFraud = as.factor(dataset$isFraud)
datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data
At the end, the error still cannot be solved. How can I turn the dataset fit ROSE model?
CodePudding user response:
As you can see in your str
part, type
, nameOrig
, nameDest
are still character not factor. It will work with change them to factors. But when I look at nameOrig
and nameDest
, it's not seems to be appropriate to included in ROSE
.
dummy2 <- head(dataset, 100)
dummy2$isFraud = as.factor(dummy2$isFraud)
#additional part.
dummy2 <- dummy2 %>%
mutate(type = factor(type),
nameDest = factor(nameDest),
nameOrig = factor(nameOrig))
dummy3 <- ROSE(isFraud~., data = dummy2, N = 500, seed = 111)$data