Subset a dataframe - grouped and multiple values R-CodePudding

I am trying to subset a dataframe by multiple values in one column.

The input is the following:

Output should be:

So i want only the rows in the new dataframe, which contains 0 AND 1 in the column "Auto" - grouped for each shop.

Already tried this, but doesnt work:

test <- subset(rawdata, Auto == 0 &Auto == 1) test <- subset(rawdata, min(Auto) == 0 & max(Auto) == 1) test<- rawdata[ which(rawdata$Auto'==0 & rawdata$Auto== 1), ]

Thanks for any help. Regards

CodePudding user response：

Please do not add data as images, provide data in a reproducible format

You can select the Shop where both 0 and 1 are present.

library(dplyr)

df %>%
  group_by(Shop) %>%
  filter(all(c(0, 1) %in% Auto)) %>%
  ungroup

#   Shop Order  Auto
#  <dbl> <dbl> <dbl>
#1     1     1     0
#2     1     2     0
#3     1     3     1

data

df <- structure(list(Shop = c(1, 1, 1, 2, 2, 2, 3, 3, 3), Order = c(1, 
2, 3, 1, 2, 3, 1, 2, 3), Auto = c(0, 0, 1, 1, 1, 1, 0, 0, 0)), 
class = "data.frame", row.names = c(NA, -9L))

CodePudding user response：

It is not very clear what you are trying to do based of your question. If I interpreted it correctly, you want to keep every row of shops where 1s and 0s occur.

To do this one possible solution might be to count the number of rows that each shop has and check wether that value is the same as the sum of auto (means all 1s) or equal to 0 (means all 0s).

If that criteria is met you want all rows of the shop to be excluded.

Look into the function summarise.

CodePudding user response：

Is this what you're looking for?

library(magrittr)
library(dplyr)

#Toy data.
df <- data.frame(Shop = c(1, 1, 1, 2, 2, 2, 3, 3, 3), 
                 Order = c(1, 2, 3, 1, 2, 3, 1 , 2, 3), 
                 Auto = c(0, 0, 1, 1, 1, 1, 0, 0, 0))

#Solution.
df %>% 
  group_by(Shop) %>%
  filter(n_distinct(Auto) > 1) %>%
  ungroup()

# # A tibble: 3 × 3
#    Shop Order  Auto
#   <dbl> <dbl> <dbl>
# 1     1     1     0
# 2     1     2     0
# 3     1     3     1

The key idea here is using dplyr::n_distinct() to count the number of unique values in Auto within each Shop group, and subsequently retaining only those groups that have more than 1 n_distinct values.