Home > Blockchain >  Groupby, mutate a new column based on conditions of one column is in the specific ranges of another
Groupby, mutate a new column based on conditions of one column is in the specific ranges of another

Time:06-10

Given a df as follows:

df <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B", 
"B"), pred_val = c(22.52, 21.87, 31.45, 21.45, 19.99, 13.96, 
15.97, 6.5, 19.89), actual_val = c(21L, 21L, 21L, 21L, 21L, 16L, 
16L, 16L, 16L)), class = "data.frame", row.names = c(NA, -9L))

Out:

group pred_val actual_val
A   22.52   21      
A   21.87   21      
A   31.45   21      
A   21.45   21      
A   19.99   21      
B   13.96   16      
B   15.97   16      
B   6.50    16      
B   19.89   16  

Let's say I'll need to groupby group column then create a new column acc_level, more specifically, for each group, if pred_val is in the range of actual_val ±2, then returns good as acc_level, if in the range of actual_val ±5, but not in actual_val ±2, then returns medium, outer of those ranges, then return poor.

How could I achieve that use dplyr or other packages in R? Thanks.

Pseudo code:

df %>% group_by(group) %>%
  mutate(acc_level = case_when((pred_val isin actual_val ±2) ~ 'good', (pred_val isin actual_val ±5) ~ 'medium', otherwise ~ 'poor'))

Expected output:

enter image description here

CodePudding user response:

df %>%
   group_by(group) %>%
   mutate(acc_level = abs(pred_val-actual_val), 
          acc_level = case_when(acc_level <= 2~'good', 
                                acc_level <= 5~'medium', TRUE~'poor'))

# A tibble: 9 x 4
# Groups:   group [2]
  group pred_val actual_val acc_level
  <chr>    <dbl>      <int> <chr>    
1 A         22.5         21 good     
2 A         21.9         21 good     
3 A         31.4         21 poor     
4 A         21.4         21 good     
5 A         20.0         21 good     
6 B         14.0         16 medium   
7 B         16.0         16 good     
8 B          6.5         16 poor     
9 B         19.9         16 medium   

CodePudding user response:

Here is another option using fcase from data.table:

library(data.table)

setDT(df)[, acc_level := fcase(
  abs(pred_val - actual_val) <= 2, "good",
  abs(pred_val - actual_val) <= 5, "medium",
  default = "poor"
), by = group]

Output

    group pred_val actual_val acc_level
   <char>    <num>      <int>    <char>
1:      A    22.52         21      good
2:      A    21.87         21      good
3:      A    31.45         21      poor
4:      A    21.45         21      good
5:      A    19.99         21      good
6:      B    13.96         16    medium
7:      B    15.97         16      good
8:      B     6.50         16      poor
9:      B    19.89         16    medium
  • Related