Home > Back-end >  How to solve the wrong variable type error when handling imbalance dataset by ROSE in R?
How to solve the wrong variable type error when handling imbalance dataset by ROSE in R?

Time:10-06

I am learning R with the Fraud Transaction data. When I try to use ROSE to handle the imbalanced dataset, the only handle continuous and categorical variables error pops up.

Here's what I tried:

str(dataset)
'data.frame':   6362620 obs. of  13 variables:
 $ step            : int  1 1 1 1 1 1 1 1 1 1 ...
 $ type            : chr  "PAYMENT" "PAYMENT" "TRANSFER" "CASH_OUT" ...
 $ amount          : num  9840 1864 181 181 11668 ...
 $ nameOrig        : chr  "C1231006815" "C1666544295" "C1305486145" "C840083671" ...
 $ oldbalanceOrg   : num  170136 21249 181 181 41554 ...
 $ newbalanceOrig  : num  160296 19385 0 0 29886 ...
 $ nameDest        : chr  "M1979787155" "M2044282225" "C553264065" "C38997010" ...
 $ oldbalanceDest  : num  0 0 0 21182 0 ...
 $ newbalanceDest  : num  0 0 0 0 0 ...
 $ isFraud         : int  0 0 1 1 0 0 0 0 0 0 ...
 $ isFlaggedFraud  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ balancedOfOrigin: num  -9840 -1864 -181 -181 -11668 ...
 $ balancedOfDest  : num  0 0 0 21182 0 ...

datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data

With Error:

Error in rose.sampl(n, N, p, ind.majo, majoY, ind.mino, minoY, y, classy, : The current implementation of ROSE handles only continuous and categorical variables.

Debugging:

# change the isFraud attribute into category 0/1
dataset$isFraud = as.factor(dataset$isFraud)
datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data

At the end, the error still cannot be solved. How can I turn the dataset fit ROSE model?

CodePudding user response:

As you can see in your str part, type, nameOrig, nameDest are still character not factor. It will work with change them to factors. But when I look at nameOrig and nameDest, it's not seems to be appropriate to included in ROSE.

dummy2 <- head(dataset, 100)

dummy2$isFraud = as.factor(dummy2$isFraud)

#additional part.
dummy2 <- dummy2 %>%
  mutate(type = factor(type),
         nameDest = factor(nameDest),
         nameOrig = factor(nameOrig))
dummy3 <- ROSE(isFraud~., data = dummy2, N = 500, seed = 111)$data
  • Related