Subsetting grouped rows by condition-CodePudding

I have a dataset grouped by the values within one column. I would like to create a subset depending on if the values of ANOTHER column are dissimilar or equal among them. For example:

df <- data.frame(
  context=c("A","A","B","B","C","C","D","D"),
  reference=c(1,1,2,4,5,5,4,1)
)

df %>% group_by(context)

How could I create a subset in which A and C are included since their values in reference are equal, and B and D in another subset since their values in reference are dissimilar?

The desired outcome would be

subset 1    subset 2
A    1      B    2
A    1      B    4
C    5      D    4
C    5      D    1

Thank you in advance

CodePudding user response：

In base unsing ave and split:

split(df, ave(df$reference, df$context, FUN=\(x) length(unique(x)) == 1))
#split(df, ave(df$reference, df$context, FUN=\(x) all(x==x[1]))) #Alternative
#$`0`
#  context reference
#3       B         2
#4       B         4
#7       D         4
#8       D         1
#
#$`1`
#  context reference
#1       A         1
#2       A         1
#5       C         5
#6       C         5

CodePudding user response：

You could add indicator if all the values are the same and then split the data using that indicator

df %>% 
  group_by(context) %>% 
  mutate(all_same=all(reference==first(reference))) %>% 
  group_by(all_same) %>% 
  group_split()

# [[1]]
# # A tibble: 4 × 3
#   context reference all_same
#  <chr>       <dbl> <lgl>   
# 1 B               2 FALSE   
# 2 B               4 FALSE   
# 3 D               4 FALSE   
# 4 D               1 FALSE   
# 
# [[2]]
# # A tibble: 4 × 3
#   context reference all_same
# <chr>       <dbl> <lgl>   
# 1 A               1 TRUE    
# 2 A               1 TRUE    
# 3 C               5 TRUE    
# 4 C               5 TRUE

CodePudding user response：

To get the subset where groups are dissimilar, you can do

df %>% 
  group_by(context) %>%
  filter(n_distinct(reference) > 1)
#> # A tibble: 4 x 2
#> # Groups:   context [2]
#>   context reference
#>   <chr>       <dbl>
#> 1 B               2
#> 2 B               4
#> 3 D               4
#> 4 D               1

And for groups where the reference values are all the same, you can do

df %>% 
  group_by(context) %>%
  filter(n_distinct(reference) == 1)
#> # A tibble: 4 x 2
#> # Groups:   context [2]
#>   context reference
#>   <chr>       <dbl>
#> 1 A               1
#> 2 A               1
#> 3 C               5
#> 4 C               5

^{Created on 2022-08-30 with reprex v2.0.2}

CodePudding user response：

Using fndistinct from collapse

library(collapse)
with(df, fndistinct(reference, context, TRA = 1) == 1) |> 
    split(df, f = _)

-output

$`FALSE`
  context reference
3       B         2
4       B         4
7       D         4
8       D         1

$`TRUE`
  context reference
1       A         1
2       A         1
5       C         5
6       C         5