Home > Enterprise >  R Lag Variable And Skip Value Between
R Lag Variable And Skip Value Between

Time:11-24

DATA = data.frame(STUDENT = c(1,1,1,2,2,2,3,3,4,4),
                  SCORE = c(6,4,8,10,9,0,2,3,3,7),
                  CLASS = c('A', 'B', 'C', 'A', 'B', 'C', 'B', 'C', 'A', 'B'),
                  WANT = c(NA, NA, 2, NA, NA, -10, NA, NA, NA, NA))

I have DATA and wish to create 'WANT' which is calculate by:

For each STUDENT, find the SCORE where SCORE equals to SCORE(CLASS = C) - SCORE(CLASS = A)

EX: SCORE(STUDENT = 1, CLASS = C) - SCORE(STUDENT = 1, CLASS = A) = 8-6=2

CodePudding user response:

Assuming at most one 'C' and 'A' CLASS per each 'STUDENT', just subset the 'SCORE' where the CLASS value is 'C', 'A', do the subtraction and assign the value only to position where CLASS is 'C' by making all other positions to NA (after grouping by 'STUDENT')

library(dplyr)
DATA <- DATA %>%
   group_by(STUDENT) %>% 
   mutate(WANT2 = (SCORE[CLASS == 'C'][1] - SCORE[CLASS == 'A'][1]) * 
       NA^(CLASS != "C")) %>%
   ungroup

-output

# A tibble: 10 × 5
   STUDENT SCORE CLASS  WANT WANT2
     <dbl> <dbl> <chr> <dbl> <dbl>
 1       1     6 A        NA    NA
 2       1     4 B        NA    NA
 3       1     8 C         2     2
 4       2    10 A        NA    NA
 5       2     9 B        NA    NA
 6       2     0 C       -10   -10
 7       3     2 B        NA    NA
 8       3     3 C        NA    NA
 9       4     3 A        NA    NA
10       4     7 B        NA    NA

CodePudding user response:

Here is a solution with the data organized in a wider format first, then a longer format below. This solution works regardless of the order of the "CLASS" column (for instance, if there is one instance in which the CLASS order is CBA or BCA instead os ABC, this solution will work).

Solution

library(dplyr)
library(tidyr)

wider <- DATA %>% select(-WANT) %>% 
  pivot_wider( names_from = "CLASS", values_from = "SCORE") %>% 
  rowwise() %>% 
  mutate(WANT = C-A) %>% 
  ungroup()

output wider

# A tibble: 4 × 5
  STUDENT     A     B     C  WANT
    <dbl> <dbl> <dbl> <dbl> <dbl>
1       1     6     4     8     2
2       2    10     9     0   -10
3       3    NA     2     3    NA
4       4     3     7    NA    NA

If you really want like your output example, then we can reorganize the wider data this way:

Reorganizing wider to long format

wider %>% 
  pivot_longer(A:C, values_to = "SCORE", names_to = "CLASS") %>% 
  relocate(WANT, .after = SCORE) %>% 
  mutate(WANT = if_else(CLASS == "C", WANT, NA_real_))

Final Output

# A tibble: 12 × 4
   STUDENT CLASS SCORE  WANT
     <dbl> <chr> <dbl> <dbl>
 1       1 A         6    NA
 2       1 B         4    NA
 3       1 C         8     2
 4       2 A        10    NA
 5       2 B         9    NA
 6       2 C         0   -10
 7       3 A        NA    NA
 8       3 B         2    NA
 9       3 C         3    NA
10       4 A         3    NA
11       4 B         7    NA
12       4 C        NA    NA
  • Related