Home > Software engineering >  Removing rows based on column conditions
Removing rows based on column conditions

Time:04-22

Suppose we have a data frame:

Event <- c("A", "A", "A", "B", "B", "C" , "C", "C")
Model <- c( 1, 2, 3, 1, 2, 1, 2, 3)

df <- data.frame(Event, Model)

Which looks like this:

event Model
A 1
A 2
A 3
B 1
B 2
C 1
C 2
C 3

We can see that event B only has 2 models of data. As the actual data frame I am using has thousands of rows and 17 columns, how can I remove all events that do not have 3 models? My guess is to use a subset however I am not sure how to do it when we have more than one condition.

I tried the suggested code from YH Jang below:

df %>% group_by(Event) %>% 
  filter(max(Model)==3) 

However, this would miss out entries in the data that looked like this.

event Model
A 1
A 3

example:

# A tibble: 6 × 2
# Groups:   Event [2]
  Event Model
  <chr> <dbl>
1 A         1
2 A         3
4 C         1
5 C         2
6 C         3

CodePudding user response:

Using dplyr,

df %>% group_by(Event) %>% 
  filter(max(Model)=3) 

the result would be

# A tibble: 6 × 2
# Groups:   Event [2]
  Event Model
  <chr> <dbl>
1 A         1
2 A         2
3 A         3
4 C         1
5 C         2
6 C         3

or using data.table,

df[df[,.I[max(Model)==3],by=Event]$V1]

the result is same as below.

   Event Model
1:     A     1
2:     A     2
3:     A     3
4:     C     1
5:     C     2
6:     C     3

EDIT
I misunderstood the question. Here's the edited answer.

# with dplyr
df %>% group_by(Event) %>% 
  filter(length(Model)>=3) 

or

# with data.table
df[df[,.I[length(Model)>=3],by=Event]$V1]

CodePudding user response:

Try this:

library(dplyr)
df %>% group_by(Event) %>% 
  filter(length(Model) >= 3) 

This removes rows that have fewer than three Model types

  • Related