Home > Back-end >  create groups in a dataframe in R
create groups in a dataframe in R

Time:05-30

hello i have the following dataframe


n<-c(2,8,9,3,7,5,7,6,3,8,2,9,10,1)
tab<-data.frame("note"=n)

I need to add a new column that classifies if the number is less than 3 it will be group 1 if it is greater than 5 it will be group 2 from 5 to 7 it will be group 3 and from 7 to 10 group 4 as shown below

enter image description here

CodePudding user response:

One option is to use case_when to define the groups:

library(dplyr)

tab %>%
  mutate(groups = case_when(note < 3 ~ 1,
                           note >= 3 & note < 7 ~ 2,
                           note == 7 ~ 3,
                           TRUE ~ 4))

Or another option using cut:

tab %>% 
  mutate(groups = cut(tab$note, breaks = c(0, 2, 6, 7, 10), labels = 1:4))

Output

   note groups
1     2     1
2     8     4
3     9     4
4     3     2
5     7     3
6     5     2
7     7     3
8     6     2
9     3     2
10    8     4
11    2     1
12    9     4
13   10     4
14    1     1

CodePudding user response:

Base R (borrowing heavily from the latemail and AndrewGB) with a reusable function:

# Function to group the numeric data: 
# group_numeric_data => function()
group_numeric_data <- function(num_vec, break_points){
  
 # Compute the group values: group_vals => integer vector
 group_vals <- seq_along(break_points)[-length(break_points)]
 
 # Compute the groups: res => factor vector
  res <- cut(
    num_vec, 
    breaks = break_points, 
    labels = group_vals
  )

 # Explictly define returned object: factor vector => env
 return(res)
 
}

# Define the break points: break_points => numeric vector
break_points <- c(-Inf, 2, 6, 7, 10)

# Apply the function: groups => factor vector
tab$groups <- group_numeric_data(
  tab$note, 
  break_points
)
  • Related