This is probably a super easy and basic question but I can't find answer. I have a data frame with some nums. And I need to add another column to that database with strings that would represent this data. What I mean, for example:
A header | Another header |
---|---|
10 | Bad |
20 | Good |
15 | Bad |
35 | Good |
"Bad" and "Good" is intervals, for example, "Bad" is 0-20 and "Good" is 20 . How do I do that with R? I think I need to use apply func.
CodePudding user response:
library(tidyverse)
data <- tibble(num = c(10, 20, 15, 35))
data
#> # A tibble: 4 × 1
#> num
#> <dbl>
#> 1 10
#> 2 20
#> 3 15
#> 4 35
data %>%
mutate(result = ifelse(num <= 20, "Bad", "Good"))
#> # A tibble: 4 × 2
#> num result
#> <dbl> <chr>
#> 1 10 Bad
#> 2 20 Bad
#> 3 15 Bad
#> 4 35 Good
Created on 2022-05-10 by the reprex package (v2.0.0)
Or using base R:
data <- data.frame(num = c(10, 20, 15, 35))
data$result <- ifelse(data$num <= 20, "Bad", "Good")
If you have more than two levels (e.g. low, medium, and high), you can also use the function cut
. There is also the function dplyr::case_when
if the rules are more complex.
CodePudding user response:
You can use mutate
and case_when
:
df %>%
mutate(status = case_when(v1 <= 20 ~ "Bad",
v1 > 20 ~ "Good"))
Output:
v1 status
1 10 Bad
2 20 Bad
3 15 Bad
4 35 Good
Data
df <- data.frame(v1 = c(10,20,15,35))
CodePudding user response:
You can use cut
.
x <- c(10, 20, 15, 35)
cut(x, c(0,20, Inf), c("Bad", "Good"), right=FALSE)
#[1] Bad Good Bad Good
What can easy be extended to more levels of gradation.
cut(x, c(0, 12, 25, Inf), c("Bad", "Normal", "Good"), right=FALSE)
#[1] Bad Normal Normal Good
Or subset a vector.
c("Bad", "Good")[1 (x>=20)]
#[1] "Bad" "Good" "Bad" "Good"
Or using findInterval
.
c("Bad", "Normal", "Good")[1 findInterval(x, c(12, 25))]
#[1] "Bad" "Normal" "Normal" "Good"