Home > front end >  assign ID based on duplicate integer variable and logical variable
assign ID based on duplicate integer variable and logical variable

Time:03-15

Im having trouble assigning rows with a new ID based on the condition of 2 other columns.

old_ID       N      n1
1            1      FALSE
2            1      FALSE
3            12     FALSE
4            12     FALSE
5            3      FALSE
6            4      FALSE
7            5      TRUE
8            5      TRUE
9            6      FALSE
10           7      FALSE
sample <- data.frame(old_ID = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14), 
                      N = c(1,1,12,12,3,4,5,5,6,7,8,9,10,11), 
                     n1 = c(FALSE, FALSE,FALSE, FALSE, FALSE, FALSE, TRUE,TRUE,FALSE,FALSE, FALSE,FALSE, FALSE, FALSE))

Column N is class integer, column n1 is class logical. There are 3 possible conditions:

  1. If N is duplicated and n1 is FALSE, i would like the 'newID' column to show a duplicate number as wel.
  2. If N is duplicated and n1 is TRUE, then the 'newID' should show a unique number.
  3. If N is unique, then the newID should show a unique number.

Desired output:

old_ID       N      n1       new_ID
1            1      FALSE     1
2            1      FALSE     1
3            12     FALSE     2
4            12     FALSE     2
5            3      FALSE     3
6            4      FALSE     4
7            5      TRUE      5
8            5      TRUE      6
9            6      FALSE     7
10           7      FALSE     8

This question is part of a larger question i asked here (Assign ID column based on multiple columns) However i think, as Kévin Legueult, suggested, i first need to find a solution for this, creating a new variable/column for this condition.

CodePudding user response:

Here's a way with data.table::rleid:

sample$id <- with(sample, data.table::rleid(N   cumsum(n1)))

#> sample
   old_ID  N    n1 id
1       1  1 FALSE  1
2       2  1 FALSE  1
3       3 12 FALSE  2
4       4 12 FALSE  2
5       5  3 FALSE  3
6       6  4 FALSE  4
7       7  5  TRUE  5
8       8  5  TRUE  6
9       9  6 FALSE  7
10     10  7 FALSE  8
11     11  8 FALSE  9
12     12  9 FALSE 10
13     13 10 FALSE 11
14     14 11 FALSE 12

CodePudding user response:

Or using base R with rle

sample$id <- with(sample, with(rle(N   cumsum(n1)),
     rep(seq_along(values), lengths) ))
  • Related