This may be a very elementary question using dplyr and tidyverse tools, but I couldn't find a good way to do it.
Let's suppose I have a data frame in a wide format, and I want to select rows so that a subset of columns has all the same value. Naively, I can do the following:
> df <- tribble(
~name, ~id, ~cost, ~value1 , ~value2, ~value3,
"a", 1, 10, 1, 1, 1,
"a", 2, 20, 1, 2, 1,
"b", 3, 50, 1, 1, 3,
"b", 4, 45, 1, 1, 1,
"b", 5, 70, 2, 2, 2
)
> df %>% select(
value1 == value2 &
value1 == value3 &
value2 == value
)
# A tibble: 3 × 6
name id cost value1 value2 value3
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 1 10 1 1 1
2 b 4 45 1 1 1
3 b 5 70 2 2 2
Now, let's suppose the number of the columns to be compared is very large (> 10). All columns start with value
, so that we may have value_something, value_otherthing, value_morething
, i.e., not necessarily numeric as in this example. However, if the number of columns is n
, naively I have to create n * (n - 1) / 2
comparisons, which is clearly unmanageable.
Is there something like
df %>% filter(all_same(starts_with("value")))
where all_same()
compares all selected columns by starts_with()
(or any other selector)?
rowwise()
and
across()
didn't help me too much either.
CodePudding user response:
We may use if_all
to loop over the columns from 'value2' to 'value3', check if the column values are equal with value1
, if_all
returns TRUE only for a row where all the column comparisons are TRUE
library(dplyr)
df %>%
filter(if_all(value2:value3, ~ value1 == .x))
-output
# A tibble: 3 × 6
name id cost value1 value2 value3
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 1 10 1 1 1
2 b 4 45 1 1 1
3 b 5 70 2 2 2
Or if we want to use starts_with
df %>%
filter(if_all(starts_with('value'), ~ value1 == .x))
CodePudding user response:
Here's a possible base R option, where we can count the number of unique values to see if there is only 1 for each row (and just for the "value" columns).
df[apply(df[, -c(1:3)], 1, function(x) length(unique(x)) == 1), ]
Output
name id cost value1 value2 value3
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 1 10 1 1 1
2 b 4 45 1 1 1
3 b 5 70 2 2 2