Home > Enterprise >  How to keep specific number of factors in dataframe
How to keep specific number of factors in dataframe

Time:07-07

I have a dataset, a factor variable contains 140 levels however, I only need 80 levels randomly selected, is there any r function or script that can help me to do this task?

CodePudding user response:

You can do in base R:

# reproducible dataset
set.seed(1)
nlevels <- 5
nkeep <- 3
string <- letters[1:5]
string <- sample(string, nlevels*2, replace = TRUE)
string <- as.factor(string)
string

[1] a d a b e c b c c a

# possible solution
keep <- sample(levels(string), nkeep)
string[string %in% keep]

[1] a a b e b a
Levels: a b c d e

Take nkeep levels randomly and keep only corresponding values. Use function droplevels afterwards if needed.

CodePudding user response:

Say that your variable is named x and is a factor with 140 levels. you could randomly select 80 levels as follows:

y = factor(x, sample(levels(x), 80))
  • Related