I have a problem solving the following question:
the values start and end are repeated in every other rows. I would like to mark them with a unique number, that increases every time a new start value is reached.
I also want rows where the obj column is empty and not inbetween a start and an end, to remain empty in the ID column.
Thanks a lot in advance!
CodePudding user response:
There might be a more streamlined way, but you can use cumsum()
with some logic.
library(dplyr)
library(tidyr)
x <- c("start", "end", NA)
df <- tibble(obj = x[c(1, 2, 1, 3, 2, 3, 3, 1, 3, 2, 1, 3, 2, 3, 3, 1, 3, 2)])
df %>%
mutate(ID = cumsum(replace_na(obj == "start", 0)),
ID = if_else(ID == cumsum(replace_na(obj == "end", 0)) & is.na(obj), NA_integer_, ID))
# A tibble: 18 x 2
obj ID
<chr> <int>
1 start 1
2 end 1
3 start 2
4 NA 2
5 end 2
6 NA NA
7 NA NA
8 start 3
9 NA 3
10 end 3
11 start 4
12 NA 4
13 end 4
14 NA NA
15 NA NA
16 start 5
17 NA 5
18 end 5
CodePudding user response:
Using the rle
and a few cumsum
s.
rl <- rle(dat$obj)
nid <- rl$values == '' & cumsum(rl$values == 'start') == cumsum(rl$values == 'end')
rl$values <- cumsum(rl$values == 'start')
rl$values[nid] <- ''
transform(dat, ID=rep(rl$values, rl$lengths))
# obj ID
# 1 start 1
# 2 end 1
# 3 start 2
# 4 2
# 5 end 2
# 6
# 7
# 8 start 3
# 9 3
# 10 end 3
# 11 start 4
# 12 4
# 13 end 4
# 14
# 15
# 16 start 5
# 17 5
# 18 end 5
Data:
dat <- structure(list(obj = c("start", "end", "start", "", "end", "",
"", "start", "", "end", "start", "", "end", "", "", "start",
"", "end")), row.names = c(NA, -18L), class = "data.frame")