I've created this dummy dataframe that represents my real data. For simplicity, I've dropped the Time column:
df <- tibble(ID = c(1, 2, 1, 3, 4, 1, 2, 3),
level = c(0, 0, 1, 2, 1, 2, 3, 0),
n_0 = 4,
n_1 = 0,
n_2 = 0,
n_3 = 0,
previous_level = c(0, 0, 0, 0, 0, 1, 0, 2))
ID level n_0 n_1 n_2 n_3 previous_level
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 4 0 0 0 0
2 2 0 4 0 0 0 0
3 1 1 4 0 0 0 0
4 3 2 4 0 0 0 0
5 4 1 4 0 0 0 0
6 1 2 4 0 0 0 1
7 2 3 4 0 0 0 0
8 3 0 4 0 0 0 2
So some words to explain this structure. The actual data comprises only the ID
and level
column. A specific ID
can only have one level
, however, this might change over time. All IDs start with level 0
. Now I want columns that track how much of my IDs (here in total 4) have levels 0, 1, 2 and 3. Therefore I've already created the count columns. Also, I think a column with previous level might be helpful.
The following table shows the result I'm expecting:
ID level n_0 n_1 n_2 n_3 previous_level
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 4 0 0 0 0
2 2 0 4 0 0 0 0
3 1 1 3 1 0 0 0
4 3 2 2 1 1 0 0
5 4 1 1 2 1 0 0
6 1 2 1 1 2 0 1
7 2 3 0 1 2 1 0
8 3 0 1 1 1 1 2
Is there a sneaky way to do so in R?
CodePudding user response:
You may try
fn <- function(df){
res <- as.data.frame(matrix(0, ncol = length(unique(df$level)), nrow = nrow(df)))
key <- factor(rep(0, length(unique(df$level))), levels = unique(df$level))
for (i in 1:nrow(df)){
if (df$level[i] != key[df$ID[i]]){
key[df$ID[i]] <- df$level[i]
res[i,] <- table(key)
} else {
res[i,] <- table(key)
}
}
names(res) <- paste0("n_",levels(key))
names(res)
df <- cbind(df, res)
return(df)
}
fn(df)
ID level previous_level n_0 n_1 n_2 n_3
1 1 0 0 4 0 0 0
2 2 0 0 4 0 0 0
3 1 1 0 3 1 0 0
4 3 2 0 2 1 1 0
5 4 1 0 1 2 1 0
6 1 2 1 1 1 2 0
7 2 3 0 0 1 2 1
8 3 0 2 1 1 1 1
CodePudding user response:
library(dplyr)
library(margrittr)
n_states = 4L
state = vector(mode = 'numeric', length = n_states)
state[1L] = n_distinct(df$ID)
for (i in seq_len(nrow(df))) {
state[df[i, 'previous_level'] 1] %<>% subtract(1)
state[df[i, 'level'] 1] %<>% add(1)
df[i, paste0('n', seq_len(n_states) - 1L)] = state
}
# ID level previous_level n0 n1 n2 n3
# 1 1 0 0 4 0 0 0
# 2 2 0 0 4 0 0 0
# 3 1 1 0 3 1 0 0
# 4 3 2 0 2 1 1 0
# 5 4 1 0 1 2 1 0
# 6 1 2 1 1 1 2 0
# 7 2 3 0 0 1 2 1
# 8 3 0 2 1 1 1 1
Data:
df <- data.frame(
ID = c(1, 2, 1, 3, 4, 1, 2, 3),
level = c(0, 0, 1, 2, 1, 2, 3, 0),
previous_level = c(0, 0, 0, 0, 0, 1, 0, 2)
)