Home > database >  Why does this regex not match decimal numbers?
Why does this regex not match decimal numbers?

Time:09-07

([.[:digit:]] )

I am thinking this should match decimal numbers like 25.8 or 0.6 ..., but it seems to give up at the "non-digit" part of the match... so I only get 25 or 0

I have tried to escape the "." with \. and . I am doing this in R, using gregexpr().

Here is a minimal reproducible example:

test
[1] "  UNITS\n  LAB             6690-2(LOINC) WBC # Bld Auto 10.99       "

LABregexlabname
[1] "LAB[[:print:][:blank:]] WBC[[:print:][:blank:]] ([\\.[:digit:]] )[:blank:]*?"

> gregexpr( LABregexlabname, test)
[[1]]
[1] 11
attr(,"match.length")
[1] 46
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE

substring( test, 11, 11 46)
[1] "LAB             6690-2(LOINC) WBC # Bld Auto 10"

CodePudding user response:

Place the last [:blank:] inside [] as [[:blank:]] and use perl=TRUE.

test <- "  UNITS\n  LAB             6690-2(LOINC) WBC # Bld Auto 10.99       "
LABregexlabname <- "LAB[[:print:][:blank:]] WBC[[:print:][:blank:]] ([.[:digit:]] )[[:blank:]]*?"

regmatches(test, regexpr(LABregexlabname, test, perl=TRUE))
#[1] "LAB             6690-2(LOINC) WBC # Bld Auto 10.99"

It looks like TRE uses minimal match everywhere when using ? at the end. In this case, when removing the ? also TRE will give the whole number but also all spaces. So maybe leaving also [[:blank:]]* ?

LABregexlabname <- "LAB[[:print:][:blank:]] WBC[[:print:][:blank:]] ([.[:digit:]] )[[:blank:]]*"
regmatches(test, regexpr(LABregexlabname, test))
#[1] "LAB             6690-2(LOINC) WBC # Bld Auto 10.99       "

LABregexlabname <- "LAB[[:print:][:blank:]] WBC[[:print:][:blank:]] ([.[:digit:]] )"
regmatches(test, regexpr(LABregexlabname, test))
#[1] "LAB             6690-2(LOINC) WBC # Bld Auto 10.99"

CodePudding user response:

  • We can use
x <- c("weight is 25.8 kg" , "distance is 0.06 km" ,
       "tall 12.012 m")

gsub("\\D*([\\.[:digit:]] ).*", "\\1", x)
  • Otput
[1] "25.8"   "0.06"   "12.012"

  • Related