I have a question similar to this, but instead of two factors, I would like to create four factors. Replace range of values for factor with levels
How do I do that? I don't know how to share my own data table so I will use the iris dataset.
library(datasets)
data(iris)
Let's say I want to categorize Sepal.Length into 4 categories 4.3-4.9,5-6,6.1-7,7.1-7.9 and label each range as A,B,C,D (factors) in a new column. Can this be done using the dplyr package? I came across several similar questions that use the "cut" function but I was not able to use it without getting an error message.
CodePudding user response:
You can use cut
inside mutate
. Pass Sepal.Length
as the first argument, the vector of cut points you want to use for the breaks
argument (it should be length-5), and the labels you want to assign via the labels
argument.
library(tidyverse)
iris %>%
as_tibble() %>%
mutate(newcol = cut(Sepal.Length, breaks = c(0, 1.9, 3.9, 5.9, 8),
labels = LETTERS[1:4]))
#> # A tibble: 150 x 6
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species newcol
#> <dbl> <dbl> <dbl> <dbl> <fct> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa C
#> 2 4.9 3 1.4 0.2 setosa C
#> 3 4.7 3.2 1.3 0.2 setosa C
#> 4 4.6 3.1 1.5 0.2 setosa C
#> 5 5 3.6 1.4 0.2 setosa C
#> 6 5.4 3.9 1.7 0.4 setosa C
#> 7 4.6 3.4 1.4 0.3 setosa C
#> 8 5 3.4 1.5 0.2 setosa C
#> 9 4.4 2.9 1.4 0.2 setosa C
#> 10 4.9 3.1 1.5 0.1 setosa C
#> # ... with 140 more rows
Created on 2023-01-30 with reprex v2.0.2