hello i have the following dataframe
n<-c(2,8,9,3,7,5,7,6,3,8,2,9,10,1)
tab<-data.frame("note"=n)
I need to add a new column that classifies if the number is less than 3 it will be group 1 if it is greater than 5 it will be group 2 from 5 to 7 it will be group 3 and from 7 to 10 group 4 as shown below
CodePudding user response:
One option is to use case_when
to define the groups:
library(dplyr)
tab %>%
mutate(groups = case_when(note < 3 ~ 1,
note >= 3 & note < 7 ~ 2,
note == 7 ~ 3,
TRUE ~ 4))
Or another option using cut
:
tab %>%
mutate(groups = cut(tab$note, breaks = c(0, 2, 6, 7, 10), labels = 1:4))
Output
note groups
1 2 1
2 8 4
3 9 4
4 3 2
5 7 3
6 5 2
7 7 3
8 6 2
9 3 2
10 8 4
11 2 1
12 9 4
13 10 4
14 1 1
CodePudding user response:
Base R (borrowing heavily from the latemail and AndrewGB) with a reusable function:
# Function to group the numeric data:
# group_numeric_data => function()
group_numeric_data <- function(num_vec, break_points){
# Compute the group values: group_vals => integer vector
group_vals <- seq_along(break_points)[-length(break_points)]
# Compute the groups: res => factor vector
res <- cut(
num_vec,
breaks = break_points,
labels = group_vals
)
# Explictly define returned object: factor vector => env
return(res)
}
# Define the break points: break_points => numeric vector
break_points <- c(-Inf, 2, 6, 7, 10)
# Apply the function: groups => factor vector
tab$groups <- group_numeric_data(
tab$note,
break_points
)