I have a vector like this in R:
dt = data.frame(input=c(0,0,1,1,1,0,1,0,0,0,1,1,1,0,1) ) dt input # 1 0 # 2 0 # 3 1 # 4 1 # 5 0 # 6 0 # 7 1 # 8 0 # 9 0 # 10 0 # 11 1 # 12 1 # 13 1 # 14 0 # 15 1
I want to replace the consecutive 0, in which the length is less than three, with 1, and save it to a new column.
For example, I want to output:
input output # 1 0 0 # 2 0 0 # 3 1 1 # 4 1 1 # 5 0 1 # 6 0 1 # 7 1 1 # 8 0 0 # 9 0 0 # 10 0 0 # 11 1 1 # 12 1 1 # 13 1 1 # 14 0 1 # 15 1 1
How can I write it in the foreach loop? (I have the data with thousands of rows)
Thanks.
CodePudding user response:
Create a grouping column with rleid
on the 'input' column, and if
the number of rows is less than 3 and all
values are 0, replace with 1 or else return input
library(dplyr)
library(data.table)
dt %>%
mutate(new = cumsum(input)) %>%
group_by(grp = rleid(input)) %>%
mutate(output = if(n() <3 & all(input == 0) & all(new > 0)) 1 else input) %>%
ungroup %>%
select(-grp, -new)
-output
# A tibble: 15 × 2
input output
<dbl> <dbl>
1 0 0
2 0 0
3 1 1
4 1 1
5 1 1
6 0 1
7 1 1
8 0 0
9 0 0
10 0 0
11 1 1
12 1 1
13 1 1
14 0 1
15 1 1
Or use base R
with rle
dt$output <- inverse.rle(within.list(rle(dt$input),
values[!values & lengths < 3 & seq_along(values) != 1] <- 1))
dt$output
#[1] 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1
CodePudding user response:
Here is a suggestion. But I don't understand the rows 1 and 2 in your output. "replace consecutive 0, in which the length is less than three, with 1" this is the case for row 1 and 2.
dt %>%
mutate(
x= cumsum(input != lag(input, def = first(input)))
) %>%
group_by(x) %>%
mutate(x = seq_along(input),
x = last(x)) %>%
mutate(output = case_when(input == 0 & x<=2 ~ 1,
TRUE ~ as.numeric(input))) %>%
ungroup() %>%
select(-x)
input output
<dbl> <dbl>
1 0 1
2 0 1
3 1 1
4 1 1
5 1 1
6 0 1
7 1 1
8 0 0
9 0 0
10 0 0
11 1 1
12 1 1
13 1 1
14 0 1
15 1 1
CodePudding user response:
Having understood the requirements like Tarjae did, another tidyverse option could be as follows.
library(dplyr)
dt %>%
mutate(x = cumsum(input)) %>%
group_by(x) %>%
mutate(y = (n() %in% 2:3)) %>%
ungroup() %>%
transmute(input = input,
inputX = if_else(y == 1, 1, input))
# # A tibble: 15 x 2
# input inputX
# <dbl> <dbl>
# 1 0 1
# 2 0 1
# 3 1 1
# 4 1 1
# 5 1 1
# 6 0 1
# 7 1 1
# 8 0 0
# 9 0 0
# 10 0 0
# 11 1 1
# 12 1 1
# 13 1 1
# 14 0 1
# 15 1 1