Home > front end >  How to extract the last text after forward slash
How to extract the last text after forward slash

Time:06-23

I have a df that looks like this:

AF GT Sample_name
0.001 1/1 path/to/sample/name/ID0001.vcf.gz
0.005 0/1 path/to/sample/name/ID0002.vcf.gz

What I want is to only keep the ID name in the Sample_name column:

AF GT Sample_name
0.001 1/1 ID0001
0.005 0/1 ID0002

I would very much appreciate any help in achieving this.

CodePudding user response:

There are some built in file name helpers that you can use here.

  • basename()
  • tools::file_path_sans_ext()

So in this example simply do:

library(tools)

df$Sample_name <- file_path_sans_ext(basename(df$Sample_name), compression = TRUE)

CodePudding user response:

You can use a regex pattern with gsub():

gsub(".*(ID\\d*).*", replacement = "\\1", x = "path/to/sample/name/ID0001.vcf.gz")
#> "ID0001"

Across your dataframe:

df$sample_name2 <- gsub(".*(ID\\d*).*", replacement = "\\1", x = df$sample_name)

CodePudding user response:

Here is tidyverse solution. Note this only works if you ID string has always: ID followed by 4 numbers:

library(dplyr)
library(stringr)

df %>% 
  mutate(Sample_name=str_extract(Sample_name, 'ID\\d{4}'))
    AF  GT Sample_name
1 0.001 1/1      ID0001
2 0.005 0/1      ID0002

CodePudding user response:

Using sub with basename to take the sample name:

df$Sample_name <- sub('\\..*$', '', basename(df$Sample_name))
df

Output:

     AF  GT Sample_name
1 0.001 1/1      ID0001
2 0.005 0/1      ID0002

Data

df <- data.frame(AF = c(0.001, 0.005),
                 GT = c("1/1", "0/1"),
                 Sample_name = c("path/to/sample/name/ID0001.vcf.gz", "path/to/sample/name/ID0002.vcf.gz"))
  •  Tags:  
  • r
  • Related