So I've been trying to figure out how to subset a dataframe where if there is a specific string within a column, it keeps those columns and drops all others. In this case, I'm searching for 'other' and would like to go from this:
A | B | C | D |
---|---|---|---|
other | one | two | three |
one | other | two | three |
two | three | one | other |
to this:
A | B | D |
---|---|---|
other | one | three |
one | other | three |
two | three | other |
I know how to filter by using the column names, but not on what is included within their cells. Is there a neat way of doing this?
CodePudding user response:
Using tidyverse you can do:
library(tidyverse)
d <- read.table(text = "A B C D
other one two three
one other two three
two three one other", header = TRUE)
d %>%
select_if(~any(.x == "other"))
# A B D
# 1 other one three
# 2 one other three
# 3 two three other
CodePudding user response:
An alternative using map_lgl
from purrr
package.
map_lgl
loops through every column applying the lambda function passed as the formula: ~ any(. == 'other'))
.
~
is short notation for function(.x)
and .
represents .x
.
The output will be a logical vector that we can use to subset df
.
library(tidyverse)
df <- read.table(text = "A B C D
other one two three
one other two three
two three one other", header = TRUE)
df[, map_lgl(df, ~ any(. == 'other'))]
#> A B D
#> 1 other one three
#> 2 one other three
#> 3 two three other
Created on 2021-11-22 by the reprex package (v2.0.1)