Home > other >  Mutate one column by filtering others in dplyr
Mutate one column by filtering others in dplyr

Time:02-24

I have a dataset with several columns that are mutually exclusive. One of the columns contains the answer for that observation (e.g. "abc") while the rest of columns contains negative numbers that represent NA. As column A in the toy dataset below, a column always contains the same answer .

I want to create a new variable by merging the column with the answer for each observation. So far, I did it as below, but in the real dataset there are 50 columns for 2000 observations, so I'm looking for a more optimized (and more elegant) way to do it.

data <- tibble::tribble(
  ~id, ~A, ~B, ~C,
  "a", "ES01", "-1", "-2",
  "b", "-1","-3", "CH041",
  "c", "-2", "DDE24", "-1",
  "d", "ES01", "-3", "-1"
)

data %>% 
  dplyr::mutate(across(A:C, ~ ifelse(str_starts(.,"-"), "", .))) %>%
  dplyr::mutate(regions = paste0(A, B, C))

Thank you.

CodePudding user response:

We may use unite from tidyr and then remove the numeric/- characters with str_remove_all

library(dplyr)
library(tidyr)
library(stringr)
data %>%
  unite(regions, A:C, remove = FALSE) %>%
  mutate(regions = str_remove_all(regions, "[^a-z] "))

-output

# A tibble: 4 × 5
  id    regions A     B     C    
  <chr> <chr>   <chr> <chr> <chr>
1 a     abc     abc   -1    -2   
2 b     def     -1    -3    def  
3 c     ghi     -2    ghi   -1   
4 d     abc     abc   -3    -1   

Using the updated dataset

data %>%
   unite(regions, A:C, remove = FALSE) %>% 
   mutate(regions = str_remove_all(regions, "_?-[0-9] _?"))

-output

# A tibble: 4 × 5
  id    regions A     B     C    
  <chr> <chr>   <chr> <chr> <chr>
1 a     ES01    ES01  -1    -2   
2 b     CH041   -1    -3    CH041
3 c     DDE24   -2    DDE24 -1   
4 d     ES01    ES01  -3    -1   

CodePudding user response:

You can use tidyr::pivot_longer to transform your data into long format and then use filter afterwards:

data %>%
  pivot_longer(-id, names_to = "col", values_to = "answer") %>%
  filter(!grepl("\\d", answer))

# A tibble: 4 x 3
  id    col   answer
  <chr> <chr> <chr> 
1 a     A     abc   
2 b     C     def   
3 c     B     ghi   
4 d     A     abc  
  • Related