Home > front end >  Conditionally remove the nth row of a group in `dplyr` in R
Conditionally remove the nth row of a group in `dplyr` in R

Time:06-23

I have this type of data, where Sequ is a grouping variable:

df <- data.frame(
  Sequ = c(1,1,1,1,
           2,2,2,
           3,3,3,
           4,4,4,4,
           5,5,5,
           6,6,6,6),
  Speaker = c("A","B",NA,"A",
              "B",NA,"C",
              "A",NA,"A",
              "A","C",NA,"A",
              "A",NA,"C",
              "B","A",NA,"C")
)

For each SequI want to remove the second row on the condition that its Speaker value is not NA. I've tried this but it removes the whole Sequ:

library(dplyr) 
df %>%
  group_by(Sequ) %>%
  filter(!is.na(nth(Speaker,2)))

How can I obtain this desired output:

df
1     1       A
2     1    <NA>
3     1       A
4     2       B
5     2    <NA>
6     2       C
7     3       A
8     3    <NA>
9     3       A
10    4       A
11    4    <NA>
12    4       A
13    5       A
14    5    <NA>
16    5       C
17    6       B
18    6    <NA>
19    6       C

CodePudding user response:

with dplyr

library(dplyr) 
df %>%
  group_by(Sequ) %>%
  filter(row_number() != 2 | is.na(Speaker))

CodePudding user response:

in base R:

subset(df, ave(Sequ, Sequ, FUN=seq_along) != 2 | is.na(Speaker))
   Sequ Speaker
1     1       A
3     1    <NA>
4     1       A
5     2       B
6     2    <NA>
7     2       C
8     3       A
9     3    <NA>
10    3       A
11    4       A
13    4    <NA>
14    4       A
15    5       A
16    5    <NA>
17    5       C
18    6       B
20    6    <NA>
21    6       C
  • Related