I have a dataset with several columns that are mutually exclusive. One of the columns contains the answer for that observation (e.g. "abc") while the rest of columns contains negative numbers that represent NA. As column A in the toy dataset below, a column always contains the same answer .
I want to create a new variable by merging the column with the answer for each observation. So far, I did it as below, but in the real dataset there are 50 columns for 2000 observations, so I'm looking for a more optimized (and more elegant) way to do it.
data <- tibble::tribble(
~id, ~A, ~B, ~C,
"a", "ES01", "-1", "-2",
"b", "-1","-3", "CH041",
"c", "-2", "DDE24", "-1",
"d", "ES01", "-3", "-1"
)
data %>%
dplyr::mutate(across(A:C, ~ ifelse(str_starts(.,"-"), "", .))) %>%
dplyr::mutate(regions = paste0(A, B, C))
Thank you.
CodePudding user response:
We may use unite
from tidyr
and then remove the numeric/- characters with str_remove_all
library(dplyr)
library(tidyr)
library(stringr)
data %>%
unite(regions, A:C, remove = FALSE) %>%
mutate(regions = str_remove_all(regions, "[^a-z] "))
-output
# A tibble: 4 × 5
id regions A B C
<chr> <chr> <chr> <chr> <chr>
1 a abc abc -1 -2
2 b def -1 -3 def
3 c ghi -2 ghi -1
4 d abc abc -3 -1
Using the updated dataset
data %>%
unite(regions, A:C, remove = FALSE) %>%
mutate(regions = str_remove_all(regions, "_?-[0-9] _?"))
-output
# A tibble: 4 × 5
id regions A B C
<chr> <chr> <chr> <chr> <chr>
1 a ES01 ES01 -1 -2
2 b CH041 -1 -3 CH041
3 c DDE24 -2 DDE24 -1
4 d ES01 ES01 -3 -1
CodePudding user response:
You can use tidyr::pivot_longer
to transform your data into long format and then use filter
afterwards:
data %>%
pivot_longer(-id, names_to = "col", values_to = "answer") %>%
filter(!grepl("\\d", answer))
# A tibble: 4 x 3
id col answer
<chr> <chr> <chr>
1 a A abc
2 b C def
3 c B ghi
4 d A abc