Home > OS >  R - Extracting numeric values from multiple txt files
R - Extracting numeric values from multiple txt files

Time:05-12

I've been trying to extract certain values from multiple text files.

dataFiles<-lapply(Sys.glob("treedata*SAMPLE01*ID97*.txt"),read.csv,header=FALSE)
dataFiles

data<-data.frame(dataFiles)

data[grepl("^DBHqsm",data$V1),]
data2<-data[grepl("^DBHqsm*",data$V1),]
data2

This gives me this so far as a data.frame of character strings, I want to be able to extract just the numnbers now from this including the decimal point, tried using regmatches and gregexpr but that removes the .

               V1            V1.1            V1.2            V1.3            V1.4
13 DBHqsm\t 0.05145 DBHqsm\t 0.05189 DBHqsm\t 0.05245 DBHqsm\t 0.05049 DBHqsm\t 0.05393
              V1.5           V1.6            V1.7
13 DBHqsm\t 0.05126 DBHqsm\t 0.0506 DBHqsm\t 0.04977

Thanks for the help!

CodePudding user response:

The following regex removes all non numerica characters preceding any amount of numbers that are followed by a dot.

We use a a lookahead assertion (?=) Note that we need perl = TRUE.

It should work if you data always follows the pattern you have shown, for example:

gsub(x = "DBHqsm\t 0.05126", pattern = "\\D*(?=\\d*?\\.)", replacement = "", perl = TRUE)```

And, of course you can do:

data.frame(
  lapply(iris, function(x) {
    gsub(x = x, pattern = "\\D*(?=\\d*?\\.)", replacement = "", perl = TRUE))
  })
)
  • Related