R - Extracting numeric values from multiple txt files-CodePudding

I've been trying to extract certain values from multiple text files.

dataFiles<-lapply(Sys.glob("treedata*SAMPLE01*ID97*.txt"),read.csv,header=FALSE)
dataFiles

data<-data.frame(dataFiles)

data[grepl("^DBHqsm",data$V1),]
data2<-data[grepl("^DBHqsm*",data$V1),]
data2

This gives me this so far as a data.frame of character strings, I want to be able to extract just the numnbers now from this including the decimal point, tried using regmatches and gregexpr but that removes the .

               V1            V1.1            V1.2            V1.3            V1.4
13 DBHqsm\t 0.05145 DBHqsm\t 0.05189 DBHqsm\t 0.05245 DBHqsm\t 0.05049 DBHqsm\t 0.05393
              V1.5           V1.6            V1.7
13 DBHqsm\t 0.05126 DBHqsm\t 0.0506 DBHqsm\t 0.04977

Thanks for the help!

CodePudding user response：

The following regex removes all non numerica characters preceding any amount of numbers that are followed by a dot.

We use a a lookahead assertion (?=) Note that we need perl = TRUE.

It should work if you data always follows the pattern you have shown, for example:

gsub(x = "DBHqsm\t 0.05126", pattern = "\\D*(?=\\d*?\\.)", replacement = "", perl = TRUE)```

And, of course you can do:

data.frame(
  lapply(iris, function(x) {
    gsub(x = x, pattern = "\\D*(?=\\d*?\\.)", replacement = "", perl = TRUE))
  })
)