Home > Enterprise >  R: Replace string with consecutive 0 less then three with 1
R: Replace string with consecutive 0 less then three with 1

Time:10-10

I have a vector like this in R:

dt = data.frame(input=c(0,0,1,1,1,0,1,0,0,0,1,1,1,0,1) )

dt
      input 
 # 1    0     
 # 2    0     
 # 3    1     
 # 4    1     
 # 5    0     
 # 6    0     
 # 7    1     
 # 8    0     
 # 9    0     
 # 10   0  
 # 11   1     
 # 12   1     
 # 13   1     
 # 14   0     
 # 15   1        

I want to replace the consecutive 0, in which the length is less than three, with 1, and save it to a new column.

For example, I want to output:

      input output
 # 1    0     0
 # 2    0     0
 # 3    1     1
 # 4    1     1
 # 5    0     1
 # 6    0     1
 # 7    1     1
 # 8    0     0
 # 9    0     0
 # 10   0     0
 # 11   1     1
 # 12   1     1
 # 13   1     1
 # 14   0     1
 # 15   1     1

How can I write it in the foreach loop? (I have the data with thousands of rows)

Thanks.

CodePudding user response:

Create a grouping column with rleid on the 'input' column, and if the number of rows is less than 3 and all values are 0, replace with 1 or else return input

library(dplyr)
library(data.table)
dt %>%
    mutate(new = cumsum(input)) %>%
    group_by(grp = rleid(input)) %>%
    mutate(output = if(n() <3 & all(input == 0) & all(new > 0)) 1 else input) %>%
    ungroup %>%
    select(-grp, -new)

-output

# A tibble: 15 × 2
   input output
   <dbl>  <dbl>
 1     0      0
 2     0      0
 3     1      1
 4     1      1
 5     1      1
 6     0      1
 7     1      1
 8     0      0
 9     0      0
10     0      0
11     1      1
12     1      1
13     1      1
14     0      1
15     1      1

Or use base R with rle

dt$output <- inverse.rle(within.list(rle(dt$input), 
     values[!values & lengths < 3 & seq_along(values) != 1] <- 1))
dt$output
#[1] 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1

CodePudding user response:

Here is a suggestion. But I don't understand the rows 1 and 2 in your output. "replace consecutive 0, in which the length is less than three, with 1" this is the case for row 1 and 2.

dt %>% 
  mutate(
    x= cumsum(input != lag(input, def = first(input)))
  ) %>% 
  group_by(x) %>% 
  mutate(x = seq_along(input),
         x = last(x)) %>% 
  mutate(output = case_when(input == 0 & x<=2 ~ 1,
                            TRUE ~ as.numeric(input))) %>% 
  ungroup() %>% 
  select(-x)
   input output
   <dbl>  <dbl>
 1     0      1
 2     0      1
 3     1      1
 4     1      1
 5     1      1
 6     0      1
 7     1      1
 8     0      0
 9     0      0
10     0      0
11     1      1
12     1      1
13     1      1
14     0      1
15     1      1

CodePudding user response:

Having understood the requirements like Tarjae did, another tidyverse option could be as follows.

library(dplyr)

dt %>%
  mutate(x = cumsum(input)) %>%
  group_by(x) %>%
  mutate(y =  (n() %in% 2:3)) %>%
  ungroup() %>%
  transmute(input = input,
            inputX = if_else(y == 1, 1, input))

# # A tibble: 15 x 2
#    input inputX
#    <dbl>  <dbl>
# 1      0      1
# 2      0      1
# 3      1      1
# 4      1      1
# 5      1      1
# 6      0      1
# 7      1      1
# 8      0      0
# 9      0      0
# 10     0      0
# 11     1      1
# 12     1      1
# 13     1      1
# 14     0      1
# 15     1      1
  •  Tags:  
  • r
  • Related