I have a data frame that has more than 20 types values in its "S.A" column. I showed a sample of the column below:
structure(list(`temp$S.A[1:30]` = c("Yaletown", "Fairview VW",
"West End VW", "Fairview VW", "Downtown VW", "Hastings", "Yaletown",
"Main", "Marpole", "West End VW", "Yaletown", "Yaletown", "Kitsilano",
"Hastings East", "Grandview VE", "Grandview Woodland", "Downtown VW",
"Downtown VW", "West End VW", "Downtown VE", "West End VW", "West End VW",
"West End VW", "Yaletown", "Downtown VW", "West End VW", "Downtown VW",
"West End VW", "Yaletown", "West End VW")), row.names = c(NA,
-30L), class = "data.frame")
if I use table
function, I get the result shown below which shows all possible values for S.A in my dataframe:
Now, what I want to do is to Replace names with repetition less than 100 with "other". For example, in the values shown below, "Arbutus" is repeated less than 100 times, so I want to change all "Arbutus" values to "other" in order to reduce the number of variables. I tried this code to find the names:
aa <- as.data.frame(table(temp$S.A))
bb <- subset(aa, aa$Freq < 100)
cc <- bb[1]
This helps me to find the names, however, I am not sure how to continue and replace them.
CodePudding user response:
To continue working with what you have you may use -
temp$S.A[temp$S.A %in% cc] <- 'Other'
to change all the values available in cc
to "Other"
.
However, forcats
has a function to do this fct_lump_min
.
tmp$S.A <- forcats::fct_lump_min(tmp$S.A, 100)