I have a dataframe below.
I am trying to create a new variable (oldType) by matching information , comparing last month's ID, alpha_CLASS, num_Class, and TYPE.
Is it possible to implement this in r?
Your input would be really appreciated.
CodePudding user response:
You can group_by
the columns that contain the same identifying information (i.e., ID
, alpha_CLASS
, and num_Class
), then use lag
to get the previous month's TYPE
.
library(dplyr)
df %>%
group_by(ID, alpha_CLASS, num_Class) %>%
mutate(oldType = lag(TYPE))
Output
YYYYMM ID alpha_CLASS num_Class TYPE oldType
<dbl> <dbl> <chr> <chr> <dbl> <dbl>
1 202101 11 A I 9 NA
2 202101 11 B II 6 NA
3 202101 11 C III 4 NA
4 202101 11 D IV 0 NA
5 202101 11 E V 8 NA
6 202101 11 F VI 7 NA
7 202102 11 C III 5 4
8 202102 11 B II 6 6
9 202102 11 F VI 7 7
10 202102 11 D IV 0 0
11 202102 11 E V 8 8
12 202102 11 A I 9 9
13 202103 11 B II 6 6
14 202103 11 F VI 7 7
15 202103 11 A I 9 9
16 202103 11 C III 5 5
17 202103 11 E V 8 8
18 202103 11 D IV 0 0
Or with data.table
:
library(data.table)
dt <- as.data.table(df)
dt[, oldType := shift(.(TYPE), type = "lag"),
by = c("ID", "alpha_CLASS", "num_Class")]
Data
df < structure(list(YYYYMM = c(202101, 202101, 202101, 202101, 202101,
202101, 202102, 202102, 202102, 202102, 202102, 202102, 202103,
202103, 202103, 202103, 202103, 202103), ID = c(11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11), alpha_CLASS = c("A",
"B", "C", "D", "E", "F", "C", "B", "F", "D", "E", "A", "B", "F",
"A", "C", "E", "D"), num_Class = c("I", "II", "III", "IV", "V",
"VI", "III", "II", "VI", "IV", "V", "I", "II", "VI", "I", "III",
"V", "IV"), TYPE = c(9, 6, 4, 0, 8, 7, 5, 6, 7, 0, 8, 9, 6, 7,
9, 5, 8, 0)), class = "data.frame", row.names = c(NA, -18L))