Home > other >  Skip specific observations when using row_number() - R
Skip specific observations when using row_number() - R

Time:10-02

I'm essentially after a "next" statement which I can use within a dplyr ifelse statement, although other R alternatives are also welcome.

Here's the code so far:

df1 <- data%>%
  arrange(Var1, Var2, Var3, Var4, Var5)%>%
  group_by(Var1)%>%
  distinct(Var1, Var2, Var3, Var4, Var5)%>%
  mutate(Var6 = ifelse(Var4 == "COMPLETE", row_number(), row_number() 1))

the output is (relevant version)

  | Var4         | Var6         |
  | ------------ | -------------|
  | COMPLETE     | 1            |
**| INCOMPLETE   | 3            |**
  | COMPLETE     | 3            |
  | COMPLETE     | 4            |
  | COMPLETE     | 5            |
**| INCOMPLETE   | 7            |**
  | COMPLETE     | 7            |
  | COMPLETE     | 8            |
  | COMPLETE     | 9            |


the intended output is


  | Var4         | Var6         |
  | ------------ | -------------|
  | COMPLETE     | 1            |
**| INCOMPLETE   | 2            |**
  | COMPLETE     | 2            |
  | COMPLETE     | 3            |
  | COMPLETE     | 4            |
**| INCOMPLETE   | 5            |**
  | COMPLETE     | 5            |
  | COMPLETE     | 6            |
  | COMPLETE     | 7            |

In summary, my goal is that when Var4 == INCOMPLETE I am able to ignore that row and continue with row_number().

CodePudding user response:

Here is one way

library(data.table)
library(dplyr)
library(tidyr)
setDT(df1)[Var4 == "COMPLETE", Var6 := .I]
df1 %>% 
   fill(Var6, .direction = "updown")

-output

        Var4 Var6
1:   COMPLETE    1
2: INCOMPLETE    2
3:   COMPLETE    2
4:   COMPLETE    3
5:   COMPLETE    4
6: INCOMPLETE    5
7:   COMPLETE    5
8:   COMPLETE    6
9:   COMPLETE    7

Or with tidyverse

df1 %>% 
   mutate(Var6 = na_if(replace(Var4, Var4 == "COMPLETE", 
     seq_len(sum(Var4 == "COMPLETE"))), "INCOMPLETE")) %>%
   fill(Var6, .direction = "updown")
        Var4 Var6
1   COMPLETE    1
2 INCOMPLETE    2
3   COMPLETE    2
4   COMPLETE    3
5   COMPLETE    4
6 INCOMPLETE    5
7   COMPLETE    5
8   COMPLETE    6
9   COMPLETE    7

data

df1 <- structure(list(Var4 = c("COMPLETE", "INCOMPLETE", "COMPLETE", 
"COMPLETE", "COMPLETE", "INCOMPLETE", "COMPLETE", "COMPLETE", 
"COMPLETE")), class = "data.frame", row.names = c(NA, -9L))

CodePudding user response:

We can use cumsum and replace or 'case_when':

df1 %>% mutate(var6 = cumsum(Var4=='COMPLETE') %>% replace(., Var4=='INCOMPLETE', .  1))

#OR

df1 %>% mutate(var6 = cumsum(Var4=='COMPLETE') %>% case_when(Var4=='INCOMPLETE', ~ .  1))
  • Related