Home > OS >  How to number each element in column conditionally on elements in other column in a dataset
How to number each element in column conditionally on elements in other column in a dataset

Time:07-20

I have a large dataset with thousands of measurements. What I want is to assign a visit number to each measurement so that all three consecutive measurements fall under the same visit number. After three consecutive measurements, the visit number increases. So the first three measurements are visit 1, the fourth to sixth measurements are visit 2, and so on. When there are only two or less measurements left, I want to mark the visit as missing.

Example dataset

DF <- data.frame(ID = rep("ID01", 10),
                 M = 1:10)

What I want:

DF$V <- c(rep(1:3, each = 3), NA)

Is there a way to this automatically?

Thanks for any help.

Update: What if each measurement contains numerous other measurements? So that:

DF <- data.frame(ID = rep("ID01", 50),
                 M0 = sample(50),
                 M = rep(1:10, each = 5))

What I want:

DF$V <- c(rep(rep(1:3, each = 3), each = 5), rep(NA, 5))

Even when the length of each level of DF$M changes (and thus is not fixed at n <- 15). E.g. length(DF$M == 1) = 21, length(DF$M == 2) = 26 etc.

Again, thanks for any help.

CodePudding user response:

A possible solution, which, thanks to a comment of @DarrenTsai, is now more concise (thanks, @DarrenTsai!):

library(dplyr)

n <- 15

DF %>% 
  group_by(ID) %>% 
  mutate(V = rep(1:(n() %/% n), each = n)[1:n()]) %>% 
  ungroup

#>      ID M0  M  V
#> 1  ID01 20  1  1
#> 2  ID01 13  1  1
#> 3  ID01 41  1  1
#> 4  ID01 21  1  1
#> 5  ID01 45  1  1
#> 6  ID01 10  2  1
#> 7  ID01 17  2  1
#> 8  ID01 43  2  1
#> 9  ID01  5  2  1
#> 10 ID01  4  2  1
#> 11 ID01 37  3  1
#> 12 ID01 22  3  1
#> 13 ID01 14  3  1
#> 14 ID01 23  3  1
#> 15 ID01 39  3  1
#> 16 ID01 33  4  2
#> 17 ID01 42  4  2
#> 18 ID01 26  4  2
#> 19 ID01 31  4  2
#> 20 ID01  1  4  2
#> 21 ID01 48  5  2
#> 22 ID01 49  5  2
#> 23 ID01 18  5  2
#> 24 ID01 29  5  2
#> 25 ID01  2  5  2
#> 26 ID01 15  6  2
#> 27 ID01  8  6  2
#> 28 ID01 32  6  2
#> 29 ID01  7  6  2
#> 30 ID01 27  6  2
#> 31 ID01 11  7  3
#> 32 ID01  9  7  3
#> 33 ID01 36  7  3
#> 34 ID01 50  7  3
#> 35 ID01 34  7  3
#> 36 ID01 40  8  3
#> 37 ID01 24  8  3
#> 38 ID01 16  8  3
#> 39 ID01 46  8  3
#> 40 ID01  3  8  3
#> 41 ID01 47  9  3
#> 42 ID01 19  9  3
#> 43 ID01 28  9  3
#> 44 ID01  6  9  3
#> 45 ID01 38  9  3
#> 46 ID01 35 10 NA
#> 47 ID01 25 10 NA
#> 48 ID01 44 10 NA
#> 49 ID01 12 10 NA
#> 50 ID01 30 10 NA

UPDATED

The following solution is to work in the case of variable lengths of the levels of DF$M. This solution is based on the following ideas:

  1. Calculate the maximum of rows across all groups of M.

  2. For each group of M, append rows to match the maximum of rows referred to above.

  3. Use the previous solution (whose code is above) to accomplish the OP goal.

library(dplyr)

DF <- DF %>% 
  slice(-30) # removes row 30, to force variable lengths in df$M

DF %>% 
  mutate(idaux = row_number()) %>% 
  add_count(M, name = "aux") %T>%
  {m <<- max(.$aux)} %>% 
  group_by(M) %>% 
  slice(c(1:n(), rep(n(), m - n()))) %>% 
  ungroup %>% 
  group_by(ID) %>% 
  mutate(V = rep(1:(n() %/% (3*m)), each = 3*m)[1:n()]) %>% 
  ungroup %>% 
  distinct %>% 
  select(ID, M0, M, V) %>% 
  as.data.frame()

#>      ID M0  M  V
#> 1  ID01 18  1  1
#> 2  ID01 22  1  1
#> 3  ID01  3  1  1
#> 4  ID01 17  1  1
#> 5  ID01 40  1  1
#> 6  ID01 20  2  1
#> 7  ID01 48  2  1
#> 8  ID01 39  2  1
#> 9  ID01 25  2  1
#> 10 ID01 49  2  1
#> 11 ID01 42  3  1
#> 12 ID01 36  3  1
#> 13 ID01 11  3  1
#> 14 ID01  5  3  1
#> 15 ID01 37  3  1
#> 16 ID01 30  4  2
#> 17 ID01 45  4  2
#> 18 ID01  1  4  2
#> 19 ID01 50  4  2
#> 20 ID01 46  4  2
#> 21 ID01 15  5  2
#> 22 ID01 16  5  2
#> 23 ID01 47  5  2
#> 24 ID01 14  5  2
#> 25 ID01 27  5  2
#> 26 ID01  8  6  2
#> 27 ID01 34  6  2
#> 28 ID01  9  6  2
#> 29 ID01  7  6  2
#> 30 ID01 43  7  3
#> 31 ID01 24  7  3
#> 32 ID01 29  7  3
#> 33 ID01 13  7  3
#> 34 ID01 23  7  3
#> 35 ID01 26  8  3
#> 36 ID01  2  8  3
#> 37 ID01 21  8  3
#> 38 ID01 38  8  3
#> 39 ID01 28  8  3
#> 40 ID01  6  9  3
#> 41 ID01 44  9  3
#> 42 ID01 19  9  3
#> 43 ID01 32  9  3
#> 44 ID01  4  9  3
#> 45 ID01 12 10 NA
#> 46 ID01 10 10 NA
#> 47 ID01 35 10 NA
#> 48 ID01 33 10 NA
#> 49 ID01 41 10 NA

CodePudding user response:

A {data.table} solution:

library(data.table)

setDT(DF)
n <- 15
DF[
  , 
  V := c(rep(x = seq_len(.N %/% n), each = n), rep(NA, times = .N %% n)), 
  by = "ID"
]
DF
#>       ID M0  M  V
#>  1: ID01 50  1  1
#>  2: ID01 30  1  1
#>  3: ID01 34  1  1
#>  4: ID01  2  1  1
#>  5: ID01 39  1  1
#>  6: ID01 15  2  1
#>  7: ID01 41  2  1
#>  8: ID01 24  2  1
#>  9: ID01 47  2  1
#> 10: ID01 49  2  1
#> 11: ID01  8  3  1
#> 12: ID01 42  3  1
#> 13: ID01 46  3  1
#> 14: ID01 28  3  1
#> 15: ID01  1  3  1
#> 16: ID01  4  4  2
#> 17: ID01 45  4  2
#> 18: ID01 43  4  2
#> 19: ID01 37  4  2
#> 20: ID01 26  4  2
#> 21: ID01 13  5  2
#> 22: ID01 20  5  2
#> 23: ID01 27  5  2
#> 24: ID01 22  5  2
#> 25: ID01 38  5  2
#> 26: ID01 10  6  2
#> 27: ID01 12  6  2
#> 28: ID01 48  6  2
#> 29: ID01 35  6  2
#> 30: ID01 44  6  2
#> 31: ID01 31  7  3
#> 32: ID01 14  7  3
#> 33: ID01 40  7  3
#> 34: ID01 23  7  3
#> 35: ID01 19  7  3
#> 36: ID01  3  8  3
#> 37: ID01 21  8  3
#> 38: ID01  5  8  3
#> 39: ID01  9  8  3
#> 40: ID01  7  8  3
#> 41: ID01 25  9  3
#> 42: ID01 36  9  3
#> 43: ID01 33  9  3
#> 44: ID01 29  9  3
#> 45: ID01 17  9  3
#> 46: ID01 11 10 NA
#> 47: ID01 16 10 NA
#> 48: ID01  6 10 NA
#> 49: ID01 32 10 NA
#> 50: ID01 18 10 NA
#>       ID M0  M  V

Created on 2022-07-20 by the reprex package (v2.0.1)

CodePudding user response:

we can use rle function

r <- rle(DF$M)$lengths
l <- unlist(Map(\(x,y) rep(x , each = 3*y) , 1:3 , r))[1:(nrow(DF)-r[length(r)])]

repna <- rep(NA , nrow(DF) - length(l))

DF$v <- c(l , repna)
  • output
     ID M0  M  v
1  ID01  3  1  1
2  ID01 47  1  1
3  ID01 46  1  1
4  ID01 11  1  1
5  ID01 37  1  1
6  ID01 18  2  1
7  ID01 29  2  1
8  ID01 16  2  1
9  ID01 32  2  1
10 ID01  2  2  1
11 ID01 21  3  1
12 ID01 31  3  1
13 ID01 19  3  1
14 ID01 17  3  1
15 ID01 41  3  1
16 ID01 30  4  2
17 ID01 22  4  2
18 ID01  5  4  2
19 ID01 44  4  2
20 ID01 43  4  2
21 ID01 27  5  2
22 ID01 23  5  2
23 ID01 33  5  2
24 ID01 26  5  2
25 ID01 38  5  2
26 ID01 20  6  2
27 ID01 39  6  2
28 ID01 50  6  2
29 ID01 40  6  2
30 ID01 28  6  2
31 ID01 35  7  3
32 ID01  6  7  3
33 ID01 24  7  3
34 ID01 14  7  3
35 ID01 42  7  3
36 ID01 48  8  3
37 ID01  9  8  3
38 ID01 49  8  3
39 ID01  7  8  3
40 ID01  1  8  3
41 ID01  4  9  3
42 ID01 13  9  3
43 ID01 10  9  3
44 ID01 34  9  3
45 ID01 45  9  3
46 ID01 36 10 NA
47 ID01 12 10 NA
48 ID01  8 10 NA
49 ID01 25 10 NA
50 ID01 15 10 NA
  •  Tags:  
  • r
  • Related