I want to change the rownames of cov_stats
, such that it contains a substring of the FileName
column values. I only want to retain the string that begins with "SRR" followed by 8 digits (e.g., SRR18826803).
cov_list <- list.files(path="./stats/", full.names=T)
cov_stats <- rbindlist(sapply(cov_list, fread, simplify=F), use.names=T, idcol="FileName")
rownames(cov_stats) <- gsub("^\.\/\SRR*_\stats.\txt", "SRR*", cov_stats[["FileName"]])
Second attempt
rownames(cov_stats) <- gsub("^SRR[:digit:]*", "", cov_stats[["FileName"]])
Original strings
> cov_stats[["FileName"]]
[1] "./stats/SRR18826803_stats.txt" "./stats/SRR18826804_stats.txt"
[3] "./stats/SRR18826805_stats.txt" "./stats/SRR18826806_stats.txt"
[5] "./stats/SRR18826807_stats.txt" "./stats/SRR18826808_stats.txt"
Desired substring output
[1] "SRR18826803" "SRR18826804"
[3] "SRR18826805" "SRR18826806"
[5] "SRR18826807" "SRR18826808"
CodePudding user response:
Would this work for you?
library(stringr)
stringr::str_extract(cov_stats[["FileName"]], "SRR.{0,8}")
CodePudding user response:
You can use
rownames(cov_stats) <- sub("^\\./stats/(SRR\\d{8}).*", "\\1", cov_stats[["FileName"]])
See the regex demo. Details:
^
- start of string\./stats/
-./stats/
string(SRR\d{8})
- Group 1 (\1
):SRR
string and then eight digits.*
- the rest of the string till its end.
Note that sub
is used (not gsub
) because there is only one expected replacement operation in the input string (since the regex matches the whole string).
See the R demo:
cov_stats <- c("./stats/SRR18826803_stats.txt", "./stats/SRR18826804_stats.txt", "./stats/SRR18826805_stats.txt", "./stats/SRR18826806_stats.txt", "./stats/SRR18826807_stats.txt")
sub("^\\./stats/(SRR\\d{8}).*", "\\1", cov_stats)
## => [1] "SRR18826803" "SRR18826804" "SRR18826805" "SRR18826806" "SRR18826807"
An equivalent extraction stringr
approach:
library(stringr)
rownames(cov_stats) <- str_extract(cov_stats[["FileName"]], "SRR\\d{8}")