I have a dataset, a factor variable contains 140 levels however, I only need 80 levels randomly selected, is there any r function or script that can help me to do this task?
CodePudding user response:
You can do in base
R:
# reproducible dataset
set.seed(1)
nlevels <- 5
nkeep <- 3
string <- letters[1:5]
string <- sample(string, nlevels*2, replace = TRUE)
string <- as.factor(string)
string
[1] a d a b e c b c c a
# possible solution
keep <- sample(levels(string), nkeep)
string[string %in% keep]
[1] a a b e b a
Levels: a b c d e
Take nkeep
levels randomly and keep only corresponding values. Use function droplevels
afterwards if needed.
CodePudding user response:
Say that your variable is named x
and is a factor with 140 levels. you could randomly select 80 levels as follows:
y = factor(x, sample(levels(x), 80))