Home > Software design >  Sort a strings based on the string patterns in R
Sort a strings based on the string patterns in R

Time:09-28

I have a data.frame that looks like df.
I want to sort the genes columns so that they start with the AT1G... pattern.

library(tidyverse)

df <- tibble(genes=c("18S","ACLA","AT1G25240","AT1G25241","AT1G25242"), functions=c("ribosome","dunno","flowering","O2","photosynthesis"))
df
#> # A tibble: 5 × 2
#>   genes     functions     
#>   <chr>     <chr>         
#> 1 18S       ribosome      
#> 2 ACLA      dunno         
#> 3 AT1G25240 flowering     
#> 4 AT1G25241 O2            
#> 5 AT1G25242 photosynthesis

Created on 2022-09-28 with reprex v2.0.2

I want my data to look like this:

genes       functions
AT1G25240    flowering
AT1G25241    O2
AT1G25242    photosynthesis
ACLA         dunno
18S          ribosome

Any idea or help is highly appreciated it! The rationale is that I want from a huge data set to see first the core genes that start with AT..

CodePudding user response:

If you sort (arrange) by the presence of the pattern using grepl, then FALSE (pattern not found) sorts first. If we negate that pattern, we get what you want:

df %>%
  arrange(!grepl("^AT1G", genes))
# # A tibble: 5 x 2
#   genes     functions     
#   <chr>     <chr>         
# 1 AT1G25240 flowering     
# 2 AT1G25241 O2            
# 3 AT1G25242 photosynthesis
# 4 18S       ribosome      
# 5 ACLA      dunno         

You can add other arguments to arrange for secondary sorts, e.g., arrange(!grepl(..), genes, functions).

  • Related