Home > Software design >  oracle - regex to find parts of a version number
oracle - regex to find parts of a version number

Time:11-09

I have a field in a table for software-style version numbers, i.e., one to three numeric values separated by "dots" but these are not decimal numbers. For example the version after 3.9 might be 3.10, then 3.11, etc. I'm trying to build a query to find the different parts of the version number (major version, minor version, and build#), using REGEX_SUBSTR in Oracle:

SELECT version,
    to_number(regexp_substr(version, '[^.] ', 1)) major,
    to_number(regexp_substr(version, '[^.] ', 2)) minor,
    to_number(regexp_substr(version, '[^.] ', 3)) build
FROM mytable

However, I'm getting some weird behavior which tells me I don't have the regex quite right. For example, if my version number has two or three single-digit parts, the second digit comes up as both the "minor" and "build" numbers. Here are some samples of the results

"3" -> 3/null/null  (the intended behavior)
"3.0" -> 3/0/0
"3.0.1" -> 3/0/0
"33.0" -> 33/3/0

How can I change the regex (or the whole query) to more easily get the parts correctly, and nulls for parts that aren't present?

FYI, I adapted these expressions from this old question about sorting by version numbers: Find the greatest version

CodePudding user response:

You're missing an argument to regexp_substr(); you're supplying the starting position for each search, not the occurrence you want to match.

At the moment it's doing:

  • major: start at the first character, find any non-period characters.
  • minor: start at the second character, find any non-period characters.
  • build: start at the third character, find any non-period characters.

which means for your values:

  • "3" - major starts at "3" and stops, and finds nothing for minor/build.
  • "3.0" - major starts at "3" and stops at the first period; minor starts at the second character, so ".0", skips the period because it has to find at least one non-period so moves on to the zero, and stops at the end; build starts at the third character, so "0", and stops at the end.
  • "3.0.1" - major starts at "3" and stops at the first period; minor starts at the second character, so ".0", skips the first period because it has to find at least one non-period so moves on to the zero, and stops at the second period; build starts at the third character, so "0", and also stops at second period.
  • "33.3" - major starts at first "3" and stops at the first period, giving 33; minor starts at the second character, so the second "3", and stops at the first period; build starts at the third character, so ".0", skips the period because it has to find at least one non-period so moves on to the zero, and stops at the end, giving 0.

You need to supply both of those arguments:

SELECT version,
    to_number(regexp_substr(version, '[^.] ', 1, 1)) major,
    to_number(regexp_substr(version, '[^.] ', 1, 2)) minor,
    to_number(regexp_substr(version, '[^.] ', 1, 3)) build
FROM mytable
VERSION MAJOR MINOR BUILD
3 3 null null
3.0 3 0 null
3.0.1 3 0 1
33.0 33 0 null

fiddle

  • Related