Home > Back-end >  Removing trailing 0s and 1s from a dataset in r
Removing trailing 0s and 1s from a dataset in r

Time:11-17

I have a dataset that is set up like this:

bird outcome
a 0
a 0
a 1
a 1
b 0
b 1
b 0
c 1
c 1
c 1

For all birds whose last outcome was 0, I removed all trailing 0s and the last 1 that preceded the trail of 0s. I used the following code:

detect <- detect %>% 
          group_by(bird) %>% 
          mutate(new = cumsum(outcome)) %>%
          filter(if(last(outcome) == 0) new <max(new) else TRUE) %>%
          ungroup %>%
          select(-new)

This code worked perfectly and produced this output:

bird outcome
a 0
a 0
a 1
a 1
b 0
c 1
c 1
c 1

Only b was trimmed because it was the only bird whose last remaining observation was 0. I would like to expand the code and have the last 1 observation trimmed for birds whose last observation was 1. I would like the output to look like this:

bird outcome
a 0
a 0
a 1
b 0
c 1
c 1

Birds with last remaining observation of 1 had their last 1 removed, and birds with last remaining observation of 0 had trailing 0s and last 1 preceding the 0 removed. But, I want this trimming to run simultaneously, not one after the other. For example, if I have a bird with outcome 0001100, I would like the trailing 0s and last 1 removed to produce 0001. I don't want it to be trimmed again and have the last remaining 1 removed.

CodePudding user response:

detect %>% 
  group_by(bird) %>% 
  mutate(new = cumsum(outcome)) %>%
  filter(if(last(outcome) == 0) new < max(new) else TRUE) %>%
  select(-new) %>%
  filter(!(row_number() == n() & last(outcome) == 1)) %>%
  ungroup()  
# A tibble: 6 × 2
#   bird  outcome
#   <chr>   <int>
# 1 a           0
# 2 a           0
# 3 a           1
# 4 b           0
# 5 c           1
# 6 c           1

Using this data:

detect = read.table(text = 'bird    outcome
a   0
a   0
a   1
a   1
b   0
b   1
b   0
c   1
c   1
c   1', header = T)

CodePudding user response:

You could do:

df %>%
  group_by(bird) %>%
  summarise(outcome = str_remove(str_c(outcome, collapse = ""), "(10 $)|(1$)")) %>%
  separate_rows(outcome, sep="(?<=.)(?=.)", convert = TRUE)

# A tibble: 6 x 2
  bird  outcome
  <chr>   <int>
1 a           0
2 a           0
3 a           1
4 b           0
5 c           1
6 c           1
  •  Tags:  
  • r
  • Related