Home > Blockchain >  Create run-length ID for subset of values
Create run-length ID for subset of values

Time:05-03

In this type of dataframe:

df <- data.frame(
  x = c(3,3,1,12,2,2,10,10,10,1,5,5,2,2,17,17)
)

how can I create a new column recording the run-length ID of only a subset of x values, say, 3-20?

My own attempt only succeeds at inserting NA where the run-length count should be interrupted; but internally it seems the count is uninterrupted:

library(data.table)
df %>%
  mutate(rle = ifelse(x %in% 3:20, rleid(x), NA))
    x rle
1   3   1
2   3   1
3   1  NA
4  12   3
5   2  NA
6   2  NA
7  10   5
8  10   5
9  10   5
10  1  NA
11  5   7
12  5   7
13  2  NA
14  2  NA
15 17   9
16 17   9

The expected result:

    x rle
1   3   1
2   3   1
3   1  NA
4  12   2
5   2  NA
6   2  NA
7  10   3
8  10   3
9  10   3
10  1  NA
11  5   4
12  5   4
13  2  NA
14  2  NA
15 17   5
16 17   5

CodePudding user response:

In base R:

df[df$x %in% 3:20, "rle"] <- data.table::rleid(df[df$x %in% 3:20, ])

    x rle
1   3   1
2   3   1
3   1  NA
4  12   2
5   2  NA
6   2  NA
7  10   3
8  10   3
9  10   3
10  1  NA
11  5   4
12  5   4
13  2  NA
14  2  NA
15 17   5
16 17   5

With left_join:

left_join(df, df %>% 
  filter(x %in% 3:20) %>% 
    distinct() %>% 
    mutate(rle = row_number()))

Joining, by = "x"
    x rle
1   3   1
2   3   1
3   1  NA
4  12   2
5   2  NA
6   2  NA
7  10   3
8  10   3
9  10   3
10  1  NA
11  5   4
12  5   4
13  2  NA
14  2  NA
15 17   5
16 17   5

CodePudding user response:

With data.table:

library(data.table)
setDT(df)

df[x            
  • Related