([.[:digit:]] )
I am thinking this should match decimal numbers like 25.8 or 0.6 ..., but it seems to give up at the "non-digit" part of the match... so I only get 25 or 0
I have tried to escape the "." with \. and . I am doing this in R, using gregexpr().
Here is a minimal reproducible example:
test
[1] " UNITS\n LAB 6690-2(LOINC) WBC # Bld Auto 10.99 "
LABregexlabname
[1] "LAB[[:print:][:blank:]] WBC[[:print:][:blank:]] ([\\.[:digit:]] )[:blank:]*?"
> gregexpr( LABregexlabname, test)
[[1]]
[1] 11
attr(,"match.length")
[1] 46
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
substring( test, 11, 11 46)
[1] "LAB 6690-2(LOINC) WBC # Bld Auto 10"
CodePudding user response:
Place the last [:blank:]
inside []
as [[:blank:]]
and use perl=TRUE
.
test <- " UNITS\n LAB 6690-2(LOINC) WBC # Bld Auto 10.99 "
LABregexlabname <- "LAB[[:print:][:blank:]] WBC[[:print:][:blank:]] ([.[:digit:]] )[[:blank:]]*?"
regmatches(test, regexpr(LABregexlabname, test, perl=TRUE))
#[1] "LAB 6690-2(LOINC) WBC # Bld Auto 10.99"
It looks like TRE uses minimal match everywhere when using ?
at the end. In this case, when removing the ?
also TRE will give the whole number but also all spaces. So maybe leaving also [[:blank:]]*
?
LABregexlabname <- "LAB[[:print:][:blank:]] WBC[[:print:][:blank:]] ([.[:digit:]] )[[:blank:]]*"
regmatches(test, regexpr(LABregexlabname, test))
#[1] "LAB 6690-2(LOINC) WBC # Bld Auto 10.99 "
LABregexlabname <- "LAB[[:print:][:blank:]] WBC[[:print:][:blank:]] ([.[:digit:]] )"
regmatches(test, regexpr(LABregexlabname, test))
#[1] "LAB 6690-2(LOINC) WBC # Bld Auto 10.99"
CodePudding user response:
- We can use
x <- c("weight is 25.8 kg" , "distance is 0.06 km" ,
"tall 12.012 m")
gsub("\\D*([\\.[:digit:]] ).*", "\\1", x)
- Otput
[1] "25.8" "0.06" "12.012"