Given a df
as follows:
df <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B"), pred_val = c(22.52, 21.87, 31.45, 21.45, 19.99, 13.96,
15.97, 6.5, 19.89), actual_val = c(21L, 21L, 21L, 21L, 21L, 16L,
16L, 16L, 16L)), class = "data.frame", row.names = c(NA, -9L))
Out:
group pred_val actual_val
A 22.52 21
A 21.87 21
A 31.45 21
A 21.45 21
A 19.99 21
B 13.96 16
B 15.97 16
B 6.50 16
B 19.89 16
Let's say I'll need to groupby group
column then create a new column acc_level
, more specifically, for each group, if pred_val
is in the range of actual_val
±2, then returns good
as acc_level
, if in the range of actual_val
±5, but not in actual_val
±2, then returns medium
, outer of those ranges, then return poor
.
How could I achieve that use dplyr
or other packages in R? Thanks.
Pseudo code:
df %>% group_by(group) %>%
mutate(acc_level = case_when((pred_val isin actual_val ±2) ~ 'good', (pred_val isin actual_val ±5) ~ 'medium', otherwise ~ 'poor'))
Expected output:
CodePudding user response:
df %>%
group_by(group) %>%
mutate(acc_level = abs(pred_val-actual_val),
acc_level = case_when(acc_level <= 2~'good',
acc_level <= 5~'medium', TRUE~'poor'))
# A tibble: 9 x 4
# Groups: group [2]
group pred_val actual_val acc_level
<chr> <dbl> <int> <chr>
1 A 22.5 21 good
2 A 21.9 21 good
3 A 31.4 21 poor
4 A 21.4 21 good
5 A 20.0 21 good
6 B 14.0 16 medium
7 B 16.0 16 good
8 B 6.5 16 poor
9 B 19.9 16 medium
CodePudding user response:
Here is another option using fcase
from data.table
:
library(data.table)
setDT(df)[, acc_level := fcase(
abs(pred_val - actual_val) <= 2, "good",
abs(pred_val - actual_val) <= 5, "medium",
default = "poor"
), by = group]
Output
group pred_val actual_val acc_level
<char> <num> <int> <char>
1: A 22.52 21 good
2: A 21.87 21 good
3: A 31.45 21 poor
4: A 21.45 21 good
5: A 19.99 21 good
6: B 13.96 16 medium
7: B 15.97 16 good
8: B 6.50 16 poor
9: B 19.89 16 medium