I'd like to find the rows that group A and group B have the same values in the feature
column and the value
column.
My data:
group feature value
1 A f1 1
2 B f1 1
3 A f2 1
4 B f2 0
5 A f3 0
6 B f3 1
df = data.frame(group = rep(c("A", "B"), 3),
feature = c("f1", "f1", "f2", "f2", "f3", "f3"),
value = c(1,1,1,0,0,1))
Desired outputs:
- Shared:
group feature value
1 A f1 1
2 B f1 1
- Group-specific:
group feature value
1 A f2 1
2 B f2 0
3 A f3 0
4 B f3 1
I've got the results by doing it manually:
- split the dataframe by the
group
column --> get two dataframe, one only has groupA
and another only has groupB
. - create a new column
feature_value
usingpaste(df$feature, df$value, sep = "_")
. - use
intersect
andsetdiff
to find the shared and group-specificfeature_value
. - filter the orginal
df
using the extractedfeature_value
.
This is laborious and I hope there is a better way.
CodePudding user response:
Solution 1: Grouped Filter
Group by feature
, then filter:
library(dplyr)
df %>%
group_by(feature) %>%
filter(n_distinct(value) == 1) %>%
ungroup()
# A tibble: 2 × 3
group feature value
<chr> <chr> <dbl>
1 A f1 1
2 B f1 1
Same code, but with !=
instead of ==
:
# A tibble: 4 × 3
feature group value
<chr> <chr> <dbl>
1 f2 A 1
2 f2 B 0
3 f3 A 0
4 f3 B 1
Solution 2: Pivoting
A bit more cumbersome, but may be more flexible e.g. if you want to make comparisons among >2 features: pivot wider, then filter, then pivot back to long:
library(dplyr)
library(tidyr)
df_wide <- df %>%
pivot_wider(names_from = group)
shared <- df_wide %>%
filter(A == B) %>%
pivot_longer(A:B, names_to = "group")
group_specific <- df_wide %>%
filter(A != B) %>%
pivot_longer(A:B, names_to = "group")
CodePudding user response:
I think this is what are you looking for?
For the first desired output:
library(data.table)
setDT(df)
df[, .SD[.I[.N > 1]], by = .(feature, value)]
feature value group
1: f1 1 A
2: f1 1 B
And, for the second desired output, we could do:
df[, .(group = df$group[.I[.N == 1]]), by = .(feature, value)]
feature value group
1: f2 1 A
2: f2 0 B
3: f3 0 A
4: f3 1 B