Home > database >  How to assign unique values (e.g. increasing numbers) to repeating column values in R?
How to assign unique values (e.g. increasing numbers) to repeating column values in R?

Time:07-12

I have a problem solving the following question:

the values start and end are repeated in every other rows. I would like to mark them with a unique number, that increases every time a new start value is reached.

image showcasing the problem and desired ID

I also want rows where the obj column is empty and not inbetween a start and an end, to remain empty in the ID column.

Thanks a lot in advance!

CodePudding user response:

There might be a more streamlined way, but you can use cumsum() with some logic.

library(dplyr)
library(tidyr)

x <- c("start", "end", NA)
df <- tibble(obj = x[c(1, 2, 1, 3, 2, 3, 3, 1, 3, 2, 1, 3, 2, 3, 3, 1, 3, 2)])

df %>% 
  mutate(ID = cumsum(replace_na(obj == "start", 0)),
         ID = if_else(ID == cumsum(replace_na(obj == "end", 0)) & is.na(obj), NA_integer_, ID))
# A tibble: 18 x 2
   obj      ID
   <chr> <int>
 1 start     1
 2 end       1
 3 start     2
 4 NA        2
 5 end       2
 6 NA       NA
 7 NA       NA
 8 start     3
 9 NA        3
10 end       3
11 start     4
12 NA        4
13 end       4
14 NA       NA
15 NA       NA
16 start     5
17 NA        5
18 end       5

CodePudding user response:

Using the rle and a few cumsums.

rl <- rle(dat$obj)
nid <- rl$values == '' & cumsum(rl$values == 'start') == cumsum(rl$values == 'end')
rl$values <- cumsum(rl$values == 'start')
rl$values[nid] <- ''

transform(dat, ID=rep(rl$values, rl$lengths))
#      obj ID
# 1  start  1
# 2    end  1
# 3  start  2
# 4         2
# 5    end  2
# 6          
# 7          
# 8  start  3
# 9         3
# 10   end  3
# 11 start  4
# 12        4
# 13   end  4
# 14         
# 15         
# 16 start  5
# 17        5
# 18   end  5  

Data:

dat <- structure(list(obj = c("start", "end", "start", "", "end", "", 
"", "start", "", "end", "start", "", "end", "", "", "start", 
"", "end")), row.names = c(NA, -18L), class = "data.frame")
  • Related