Home > Software engineering >  R: how to shorten a value to just one number inside the value
R: how to shorten a value to just one number inside the value

Time:03-22

Chromosome_name Start Position
CHR_HSCHR7_2_CTG6 142857940
CHR_HSCHR19LRC_PGF2_CTG3_1 54316049

I have just started to use R. I have a data frame of chromosome names but I just want to replace the long names with the number of the chromosome. i.e CHR_HSCHR19LRC_PGF2_CTG3_1 would be "19" I need to replace the long name with the number just after the characters "HRCHR" How would I do this?

I tried the method of manually entry the replacement value: gsub(".*HSCHR19", "19", dataframe)

But this takes far too long for a list of >100 values. I would like to find a way to do this automatically.

CodePudding user response:

You can use

sub('^.*CHR(\\d ).*$', '\\1', Chromosome_name)
#> [1] "7"  "19"

CodePudding user response:

Another potential option is a look-behind regex, e.g.

library(tidyverse)

df <- read.table(text = "Chromosome_name    Start_Position
CHR_HSCHR7_2_CTG6   142857940
CHR_HSCHR19LRC_PGF2_CTG3_1  54316049", header = TRUE)

df2 <- df %>%
  mutate(Chromosome_name = str_extract(Chromosome_name, "(?<=HSCHR)\\d "))

df2
#>   Chromosome_name Start_Position
#> 1               7      142857940
#> 2              19       54316049

Created on 2022-03-22 by the reprex package (v2.0.1)

  • Related