I have a dataset grouped by the values within one column. I would like to create a subset depending on if the values of ANOTHER column are dissimilar or equal among them. For example:
df <- data.frame(
context=c("A","A","B","B","C","C","D","D"),
reference=c(1,1,2,4,5,5,4,1)
)
df %>% group_by(context)
How could I create a subset in which A and C are included since their values in reference are equal, and B and D in another subset since their values in reference are dissimilar?
The desired outcome would be
subset 1 subset 2
A 1 B 2
A 1 B 4
C 5 D 4
C 5 D 1
Thank you in advance
CodePudding user response:
In base unsing ave
and split
:
split(df, ave(df$reference, df$context, FUN=\(x) length(unique(x)) == 1))
#split(df, ave(df$reference, df$context, FUN=\(x) all(x==x[1]))) #Alternative
#$`0`
# context reference
#3 B 2
#4 B 4
#7 D 4
#8 D 1
#
#$`1`
# context reference
#1 A 1
#2 A 1
#5 C 5
#6 C 5
CodePudding user response:
You could add indicator if all the values are the same and then split the data using that indicator
df %>%
group_by(context) %>%
mutate(all_same=all(reference==first(reference))) %>%
group_by(all_same) %>%
group_split()
# [[1]]
# # A tibble: 4 × 3
# context reference all_same
# <chr> <dbl> <lgl>
# 1 B 2 FALSE
# 2 B 4 FALSE
# 3 D 4 FALSE
# 4 D 1 FALSE
#
# [[2]]
# # A tibble: 4 × 3
# context reference all_same
# <chr> <dbl> <lgl>
# 1 A 1 TRUE
# 2 A 1 TRUE
# 3 C 5 TRUE
# 4 C 5 TRUE
CodePudding user response:
To get the subset where groups are dissimilar, you can do
df %>%
group_by(context) %>%
filter(n_distinct(reference) > 1)
#> # A tibble: 4 x 2
#> # Groups: context [2]
#> context reference
#> <chr> <dbl>
#> 1 B 2
#> 2 B 4
#> 3 D 4
#> 4 D 1
And for groups where the reference values are all the same, you can do
df %>%
group_by(context) %>%
filter(n_distinct(reference) == 1)
#> # A tibble: 4 x 2
#> # Groups: context [2]
#> context reference
#> <chr> <dbl>
#> 1 A 1
#> 2 A 1
#> 3 C 5
#> 4 C 5
Created on 2022-08-30 with reprex v2.0.2
CodePudding user response:
Using fndistinct
from collapse
library(collapse)
with(df, fndistinct(reference, context, TRA = 1) == 1) |>
split(df, f = _)
-output
$`FALSE`
context reference
3 B 2
4 B 4
7 D 4
8 D 1
$`TRUE`
context reference
1 A 1
2 A 1
5 C 5
6 C 5