Home > Software engineering >  Problem with dplyr string mutation of dataset
Problem with dplyr string mutation of dataset

Time:12-07

im having trouble with a simple mutation of a dataframe that looks like this:

  interaction alphabetical
1      A pp B         ABpp
2      A pp G         AGpp
3      G pp A         AGpp
4      A pp J         AJpp
5      J pp A         AJpp
6      Q pp A         AppQ

I want to use the alphabetical column to make a new interaction column in alphabetical order for every single row. Example: AGpp -> A pp G

I attempted this by using this line:

d<-d%>%mutate(correct_order_interaction=paste(unlist(strsplit(as.character(alphabetical),""))[1],"pp",unlist(strsplit(as.character(alphabetical),""))[2]))

However, this results in this dataframe:

  interaction alphabetical correct_order_interaction
1      A pp B         ABpp                    A pp B
2      A pp G         AGpp                    A pp B
3      G pp A         AGpp                    A pp B
4      A pp J         AJpp                    A pp B
5      J pp A         AJpp                    A pp B
6      Q pp A         AppQ                    A pp B

I dont quite understand why this doesnt work. This may not be the best way of solving the problem but i've done this before and it normally works just fine.

I hope anyone can help me, and please let me know if there are better ways of approaching this problem :)

Thanks a lot in advance

dput dataframe:

structure(list(interaction = c("A pp B", "A pp G", "G pp A", 
"A pp J", "J pp A", "Q pp A"), alphabetical = c("ABpp", "AGpp", 
"AGpp", "AJpp", "AJpp", "AppQ")), row.names = c(NA, 6L), class = "data.frame")

CodePudding user response:

You could use str_match_all map_chr:

df %>%
  mutate(
    correct = alphabetical %>%
     str_match_all("[A-Z]") %>%
     map_chr(str_c, collapse = " pp ")
  )

CodePudding user response:

library(tidyverse) 

correct_order <- function(string) {
  string_clean <- string %>% 
    str_remove_all("[a-z]") %>% 
    str_split("") %>% 
    unlist()
  
  str_c(string_clean %>% 
          first(), "pp", string_clean %>% last(), sep = " ") 
}

df %>%  
  rowwise() %>% 
  mutate(correct = correct_order(alphabetical))

# A tibble: 6 × 3
# Rowwise: 
  interaction alphabetical correct
  <chr>       <chr>        <chr>  
1 A pp B      ABpp         A pp B 
2 A pp G      AGpp         A pp G 
3 G pp A      AGpp         A pp G 
4 A pp J      AJpp         A pp J 
5 J pp A      AJpp         A pp J 
6 Q pp A      AppQ         A pp Q 

One-liner:

df %>% 
  mutate(correct = map_chr(alphabetical, ~
                             str_c(.x %>% 
                                     str_remove_all("[a-z]") %>% 
                                     str_split("") %>% 
                                     unlist() %>% 
                                     first(), 
                                   "pp",
                                   .x %>% 
                                     str_remove_all("[a-z]") %>% 
                                     str_split("") %>% 
                                     unlist() %>% 
                                     last(), 
                                   sep = " ")
                           ))
  • Related