Im having trouble assigning rows with a new ID based on the condition of 2 other columns.
old_ID N n1
1 1 FALSE
2 1 FALSE
3 12 FALSE
4 12 FALSE
5 3 FALSE
6 4 FALSE
7 5 TRUE
8 5 TRUE
9 6 FALSE
10 7 FALSE
sample <- data.frame(old_ID = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14),
N = c(1,1,12,12,3,4,5,5,6,7,8,9,10,11),
n1 = c(FALSE, FALSE,FALSE, FALSE, FALSE, FALSE, TRUE,TRUE,FALSE,FALSE, FALSE,FALSE, FALSE, FALSE))
Column N is class integer, column n1 is class logical. There are 3 possible conditions:
- If N is duplicated and n1 is FALSE, i would like the 'newID' column to show a duplicate number as wel.
- If N is duplicated and n1 is TRUE, then the 'newID' should show a unique number.
- If N is unique, then the newID should show a unique number.
Desired output:
old_ID N n1 new_ID
1 1 FALSE 1
2 1 FALSE 1
3 12 FALSE 2
4 12 FALSE 2
5 3 FALSE 3
6 4 FALSE 4
7 5 TRUE 5
8 5 TRUE 6
9 6 FALSE 7
10 7 FALSE 8
This question is part of a larger question i asked here (Assign ID column based on multiple columns) However i think, as Kévin Legueult, suggested, i first need to find a solution for this, creating a new variable/column for this condition.
CodePudding user response:
Here's a way with data.table::rleid
:
sample$id <- with(sample, data.table::rleid(N cumsum(n1)))
#> sample
old_ID N n1 id
1 1 1 FALSE 1
2 2 1 FALSE 1
3 3 12 FALSE 2
4 4 12 FALSE 2
5 5 3 FALSE 3
6 6 4 FALSE 4
7 7 5 TRUE 5
8 8 5 TRUE 6
9 9 6 FALSE 7
10 10 7 FALSE 8
11 11 8 FALSE 9
12 12 9 FALSE 10
13 13 10 FALSE 11
14 14 11 FALSE 12
CodePudding user response:
Or using base R
with rle
sample$id <- with(sample, with(rle(N cumsum(n1)),
rep(seq_along(values), lengths) ))