How to use stringr to only keep a column with matching text-CodePudding

My data:

structure(list(x = c(-32.2473803623946, -10.3430055552535,
                     -10.4110625155105, -30.6804086316593), 
     y = c(-1.04361388101641, 24.6971017038231, 
            24.6303839929497, 35.7958624586036), 
     z = c(202.270724194289, 228.921139241279,
            226.240853533147, 232.326865994258), 
     ...4 = c(0, 0, 0, 0), ...5 = c(0, 0, 0, 0), 
     ...6 = c(0, 0, 0, 0), ...7 = c(1, 1, 1, 1), 
     ...8 = c(1, 1, 1, 1), ...9 = c(1, 1, 1, 1), 
     ...10 = c(1, 1, 1, 1), 
     ...11 = c("Point # 1 in 1-LV_TC_EDIT", 
               "Point # 2 in 1-LV_TC_EDIT", 
               "Point # 3 in 1-LV_TC_EDIT", 
               "Point # 5 in 1-LV_TC_EDIT"), 
     ...12 = c("Bipolar 7.827 / Unipolar 16.911 / LAT -9.0", 
            "Bipolar 2.34 / Unipolar 9.09 / LAT -10.0",
            "Bipolar 1.974 / Unipolar 9.219 / LAT -11.0", 
            "Bipolar 1.938 / Unipolar 10.572 / LAT -9.0")), 
     row.names = c(NA, -4L), 
     class = c("tbl_df", "tbl", "data.frame"))

I'm trying to only keep a column if it contains certain text.

This code labels the correct column as TRUE but gives this atomic vector error

bipol  %>% stringr::str_detect(., "Bipolar")

Warning: argument is not an atomic vector; coercing [1] FALSE ...  TRUE

That error led me to here: using select and stringr together but I'm not sure how to incorporate this into my code

But when I use it with select(where( logic, it returns where() must be used with functions that return TRUE or FALSE.

bipol %>% select(where(~ stringr::str_detect(., "Bipolar")))

Error in `select()`:
! `where()` must be used with functions that return `TRUE` or `FALSE`.
Backtrace:
  1. bipol %>% ...
  3. dplyr:::select.data.frame(., where(~stringr::str_detect(., "Bipolar")))
  6. tidyselect::eval_select(expr(c(...)), .data)
  7. tidyselect:::eval_select_impl(...)
 16. tidyselect:::vars_select_eval(...)
     ...
 19. tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
 20. tidyselect:::walk_data_tree(new, data_mask, context_mask)
 21. tidyselect:::as_indices_sel_impl(...)
 23. purrr::map_lgl(data, predicate)
 24. tidyselect (local) .f(.x[[i]], ...)

I will then cbind() just this column to the the first 3 columns and extract the relevent text using stringr::str_extract

Thanks!

CodePudding user response：

select with where should return a single TRUE/FALSE for each column for selection of that column. So, wrap with any - i.e. str_detect will be applied on each of the columns to check whether there is 'Bipolar', but the length will be equal to the length of the column with TRUE/FALSE values as a vector. Wrapping with any returns only a single TRUE if there is any TRUE value and FALSE is nothing matches

library(dplyr)
library(stringr)
bipol %>% 
    select(where(~ any(stringr::str_detect(.x, "Bipolar"))))

-output

# A tibble: 4 × 1
  ...12                                     
  <chr>                                     
1 Bipolar 7.827 / Unipolar 16.911 / LAT -9.0
2 Bipolar 2.34 / Unipolar 9.09 / LAT -10.0  
3 Bipolar 1.974 / Unipolar 9.219 / LAT -11.0
4 Bipolar 1.938 / Unipolar 10.572 / LAT -9.0

We may also add a short circuit to check the 'Bipolar' on only character columns

bipol %>% 
  select(where(~ is.character(.x) && any(stringr::str_detect(., "Bipolar"))))
# A tibble: 4 × 1
  ...12                                     
  <chr>                                     
1 Bipolar 7.827 / Unipolar 16.911 / LAT -9.0
2 Bipolar 2.34 / Unipolar 9.09 / LAT -10.0  
3 Bipolar 1.974 / Unipolar 9.219 / LAT -11.0
4 Bipolar 1.938 / Unipolar 10.572 / LAT -9.0

The error is a bit misleading, but in essence

> bipol %>% select(where(~ stringr::str_detect(., "Bipolar")))
Error in `select()`:
! `where()` must be used with functions that return `TRUE` or `FALSE`.

It means to return a single TRUE/FALSE and not a vector of TRUE/FALSE of length > 1