I've been trying to extract certain values from multiple text files.
dataFiles<-lapply(Sys.glob("treedata*SAMPLE01*ID97*.txt"),read.csv,header=FALSE)
dataFiles
data<-data.frame(dataFiles)
data[grepl("^DBHqsm",data$V1),]
data2<-data[grepl("^DBHqsm*",data$V1),]
data2
This gives me this so far as a data.frame of character strings, I want to be able to extract just the numnbers now from this including the decimal point, tried using regmatches and gregexpr but that removes the .
V1 V1.1 V1.2 V1.3 V1.4
13 DBHqsm\t 0.05145 DBHqsm\t 0.05189 DBHqsm\t 0.05245 DBHqsm\t 0.05049 DBHqsm\t 0.05393
V1.5 V1.6 V1.7
13 DBHqsm\t 0.05126 DBHqsm\t 0.0506 DBHqsm\t 0.04977
Thanks for the help!
CodePudding user response:
The following regex removes all non numerica characters preceding any amount of numbers that are followed by a dot.
We use a a lookahead assertion (?=)
Note that we need perl = TRUE
.
It should work if you data always follows the pattern you have shown, for example:
gsub(x = "DBHqsm\t 0.05126", pattern = "\\D*(?=\\d*?\\.)", replacement = "", perl = TRUE)```
And, of course you can do:
data.frame(
lapply(iris, function(x) {
gsub(x = x, pattern = "\\D*(?=\\d*?\\.)", replacement = "", perl = TRUE))
})
)