Home > Back-end >  How to look up a value by matching information and create a new variable with same data in r?
How to look up a value by matching information and create a new variable with same data in r?

Time:04-27

I have a dataframe below.

enter image description here

I am trying to create a new variable (oldType) by matching information , comparing last month's ID, alpha_CLASS, num_Class, and TYPE.

enter image description here

Is it possible to implement this in r?

Your input would be really appreciated.

CodePudding user response:

You can group_by the columns that contain the same identifying information (i.e., ID, alpha_CLASS, and num_Class), then use lag to get the previous month's TYPE.

library(dplyr)

df %>% 
  group_by(ID, alpha_CLASS, num_Class) %>% 
  mutate(oldType = lag(TYPE))

Output

   YYYYMM    ID alpha_CLASS num_Class  TYPE oldType
    <dbl> <dbl> <chr>       <chr>     <dbl>   <dbl>
 1 202101    11 A           I             9      NA
 2 202101    11 B           II            6      NA
 3 202101    11 C           III           4      NA
 4 202101    11 D           IV            0      NA
 5 202101    11 E           V             8      NA
 6 202101    11 F           VI            7      NA
 7 202102    11 C           III           5       4
 8 202102    11 B           II            6       6
 9 202102    11 F           VI            7       7
10 202102    11 D           IV            0       0
11 202102    11 E           V             8       8
12 202102    11 A           I             9       9
13 202103    11 B           II            6       6
14 202103    11 F           VI            7       7
15 202103    11 A           I             9       9
16 202103    11 C           III           5       5
17 202103    11 E           V             8       8
18 202103    11 D           IV            0       0

Or with data.table:

library(data.table)
dt <- as.data.table(df)

dt[, oldType := shift(.(TYPE), type = "lag"),
   by = c("ID", "alpha_CLASS", "num_Class")]

Data

df < structure(list(YYYYMM = c(202101, 202101, 202101, 202101, 202101, 
202101, 202102, 202102, 202102, 202102, 202102, 202102, 202103, 
202103, 202103, 202103, 202103, 202103), ID = c(11, 11, 11, 11, 
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11), alpha_CLASS = c("A", 
"B", "C", "D", "E", "F", "C", "B", "F", "D", "E", "A", "B", "F", 
"A", "C", "E", "D"), num_Class = c("I", "II", "III", "IV", "V", 
"VI", "III", "II", "VI", "IV", "V", "I", "II", "VI", "I", "III", 
"V", "IV"), TYPE = c(9, 6, 4, 0, 8, 7, 5, 6, 7, 0, 8, 9, 6, 7, 
9, 5, 8, 0)), class = "data.frame", row.names = c(NA, -18L))
  •  Tags:  
  • r
  • Related