I have one column that has important identifiers, but they are not uniform, so I am stuck with string splitting. I guess an example is best.
Code example, "df" is the tibble as is, result the result
a <- c('Text1','Text1','Text1','Text1_sub1','Text1_sub2','Text2','Text2','Text2_sub1_sub2','Text2_sub3')
b <- c("0" ,"0" ,"0" ,NA ,NA ,"0" ,"0" ,NA ,NA)
df <- tibble(a,b)
a <- c('Text1','Text1','Text1','Text1' ,'Text1' ,'Text2','Text2','Text2' ,'Text2')
b <- c("0" ,"0" ,"0" ,"sub1" ,"sub2" ,"0" ,"0" ,"sub1_sub2" ,"sub3")
result <- tibble(a,b)
In Different formating:
> df
# A tibble: 9 x 2
a b
<chr> <chr>
1 Text1 0
2 Text1 0
3 Text1 0
4 Text1_sub1 NA
5 Text1_sub2 NA
6 Text2 0
7 Text2 0
8 Text2_sub1_sub2 NA
9 Text2_sub3 NA
> result
# A tibble: 9 x 2
a b
<chr> <chr>
1 Text1 0
2 Text1 0
3 Text1 0
4 Text1 sub1
5 Text1 sub2
6 Text2 0
7 Text2 0
8 Text2 sub1_sub2
9 Text2 sub3
I want to use column as identifier, therefore I would like to take the unique strings in column a where b == 0, and then substract those from column a where b != 0, and transfer it to b. Any help highly appreciated, tidy solution would be nice, but any pointer will do, thanks!
CodePudding user response:
We can use separate
:
df %>%
separate(a,
into = c("a", "b_new"),
sep = "_",
extra = "merge") %>%
mutate(b = coalesce(b, b_new)) %>%
select(-b_new)
# A tibble: 9 × 2
a b
<chr> <chr>
1 Text1 0
2 Text1 0
3 Text1 0
4 Text1 sub1
5 Text1 sub2
6 Text2 0
7 Text2 0
8 Text2 sub1_sub2
9 Text2 sub3
CodePudding user response:
How about this:
library(dplyr)
library(glue)
a <- c('Text1','Text1','Text1','Text1_sub1','Text1_sub2','Text2','Text2','Text2_sub1_sub2','Text2_sub3')
b <- c("0" ,"0" ,"0" ,NA ,NA ,"0" ,"0" ,NA ,NA)
df <- tibble(a,b)
unstr <- df %>%
filter(b == "0") %>%
select(a) %>%
distinct() %>%
pull %>%
glue_collapse(sep="|")
result <- df %>%
mutate(b = case_when(is.na(b) ~ gsub(glue("{unstr}_(.*)"), "\\1", a),
TRUE ~ b),
a = gsub(glue("({unstr})_.*"), "\\1", a))
result
#> # A tibble: 9 × 2
#> a b
#> <chr> <chr>
#> 1 Text1 0
#> 2 Text1 0
#> 3 Text1 0
#> 4 Text1 _sub1
#> 5 Text1 _sub2
#> 6 Text2 0
#> 7 Text2 0
#> 8 Text2 sub1_sub2
#> 9 Text2 sub3
Created on 2022-11-23 by the reprex package (v2.0.1)