Home > OS >  R: Count frequencies of levels across whole time series
R: Count frequencies of levels across whole time series


I've created this dummy dataframe that represents my real data. For simplicity, I've dropped the Time column:

df <- tibble(ID = c(1, 2, 1, 3, 4, 1, 2, 3),
             level = c(0, 0, 1, 2, 1, 2, 3, 0),
             n_0 = 4,
             n_1 = 0,
             n_2 = 0,
             n_3 = 0,
             previous_level = c(0, 0, 0, 0, 0, 1, 0, 2))

    ID level   n_0   n_1   n_2   n_3 previous_level
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>          <dbl>
1     1     0     4     0     0     0              0
2     2     0     4     0     0     0              0
3     1     1     4     0     0     0              0
4     3     2     4     0     0     0              0
5     4     1     4     0     0     0              0
6     1     2     4     0     0     0              1
7     2     3     4     0     0     0              0
8     3     0     4     0     0     0              2

So some words to explain this structure. The actual data comprises only the ID and level column. A specific ID can only have one level, however, this might change over time. All IDs start with level 0. Now I want columns that track how much of my IDs (here in total 4) have levels 0, 1, 2 and 3. Therefore I've already created the count columns. Also, I think a column with previous level might be helpful.

The following table shows the result I'm expecting:

     ID level   n_0   n_1   n_2   n_3 previous_level
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>          <dbl>
1     1     0     4     0     0     0              0
2     2     0     4     0     0     0              0
3     1     1     3     1     0     0              0
4     3     2     2     1     1     0              0
5     4     1     1     2     1     0              0
6     1     2     1     1     2     0              1
7     2     3     0     1     2     1              0
8     3     0     1     1     1     1              2

Is there a sneaky way to do so in R?

CodePudding user response:

You may try

fn <- function(df){
  res <- as.data.frame(matrix(0, ncol = length(unique(df$level)), nrow = nrow(df)))
  key <- factor(rep(0, length(unique(df$level))), levels = unique(df$level))
  for (i in 1:nrow(df)){
    if (df$level[i] != key[df$ID[i]]){
      key[df$ID[i]] <- df$level[i]
      res[i,] <- table(key)
    } else {
      res[i,] <- table(key)
  names(res) <- paste0("n_",levels(key))
  df <- cbind(df, res)
  ID level previous_level n_0 n_1 n_2 n_3
1  1     0              0   4   0   0   0
2  2     0              0   4   0   0   0
3  1     1              0   3   1   0   0
4  3     2              0   2   1   1   0
5  4     1              0   1   2   1   0
6  1     2              1   1   1   2   0
7  2     3              0   0   1   2   1
8  3     0              2   1   1   1   1

CodePudding user response:


n_states  = 4L
state     = vector(mode = 'numeric', length = n_states)
state[1L] = n_distinct(df$ID)

for (i in seq_len(nrow(df))) {
  state[df[i, 'previous_level']   1]         %<>% subtract(1)
  state[df[i, 'level']   1]                  %<>% add(1) 
  df[i, paste0('n', seq_len(n_states) - 1L)] =    state

#   ID level previous_level n0 n1 n2 n3
# 1  1     0              0  4  0  0  0
# 2  2     0              0  4  0  0  0
# 3  1     1              0  3  1  0  0
# 4  3     2              0  2  1  1  0
# 5  4     1              0  1  2  1  0
# 6  1     2              1  1  1  2  0
# 7  2     3              0  0  1  2  1
# 8  3     0              2  1  1  1  1


df <- data.frame(
  ID = c(1, 2, 1, 3, 4, 1, 2, 3),
  level = c(0, 0, 1, 2, 1, 2, 3, 0),
  previous_level = c(0, 0, 0, 0, 0, 1, 0, 2)
  • Related