I'm quite new to sql so I don't fully understand what I'm doing. My question is, how can I order by this data, that the order would go by number like 1,2,3, but there are letters inside, so it would be like 1,2,A3, B3, 4.
I have this so far:
ORDER BY REGEXP_REPLACE(TRIM(REGEXP_SUBSTR(string, '\d', 1, 1)),'(\d .)(\d .)?(\d .)?(\d .)?', '\1') NULLS FIRST, REGEXP_REPLACE(TRIM(REGEXP_SUBSTR(string, '\d', 2, 2)),'(\d .)(\d .)?(\d .)?(\d .)?', '\2') NULLS FIRST, REGEXP_REPLACE(TRIM(REGEXP_SUBSTR(string, '\d', 3, 3)),'(\d .)(\d .)?(\d .)?(\d .)?', '\3') NULLS FIRST, REGEXP_REPLACE(TRIM(REGEXP_SUBSTR(string, '\d', 4, 4)),'(\d .)(\d .)?(\d .)?(\d .)?', '\4') NULLS FIRST, REGEXP_REPLACE(TRIM(REGEXP_SUBSTR(string, '\d', 5, 5)),'(\d .)(\d .)?(\d .)?(\d .)?', '\5') NULLS FIRST, string
it works until we have a third level decimal, like 2.7.1., then it goes like 2.1, 2.2...2.7,2.8, 2.7.1., 2.7.2. I don't know if I'm explaining it correctly so I have pic too :D
When it's like this it works: Picture of Working sort
But then we have numbers like 2.7.1., 2.7.2. Picture with incorrect sort
So the plan is that it would sort number with string too, like: 1,2,A3,A3.1, A3.1.1. B3,B3.1,4.1, 4.1.1, 4.1.2...
Thank you so much for your help and explaining.
CodePudding user response:
The immediate problem is that you are incrementing the positionas well as the occurrence; so
REGEXP_SUBSTR(string, '\d', 2, 2)
should be
REGEXP_SUBSTR(string, '\d', 1, 2)
But you're overcomplicating it, and not handling multiple-digit elements, either when extracting or when sorting as nothing is treated as a number.
I think this does what you want:
ORDER BY
TO_NUMBER(REGEXP_SUBSTR(string, '\d ', 1, 1)) nulls first,
TO_NUMBER(REGEXP_SUBSTR(string, '\d ', 1, 2)) nulls first,
TO_NUMBER(REGEXP_SUBSTR(string, '\d ', 1, 3)) nulls first,
TO_NUMBER(REGEXP_SUBSTR(string, '\d ', 1, 4)) nulls first,
TO_NUMBER(REGEXP_SUBSTR(string, '\d ', 1, 5)) nulls first,
string
If only the first element can start with a character (or characters), then you just just add a check for that after the first digit:
ORDER BY
TO_NUMBER(REGEXP_SUBSTR(string, '\d ', 1, 1)) nulls first,
REGEXP_SUBSTR(string, '^\w ', 1, 1) nulls first,
TO_NUMBER(REGEXP_SUBSTR(string, '\d ', 1, 2)) nulls first,
TO_NUMBER(REGEXP_SUBSTR(string, '\d ', 1, 3)) nulls first,
TO_NUMBER(REGEXP_SUBSTR(string, '\d ', 1, 4)) nulls first,
TO_NUMBER(REGEXP_SUBSTR(string, '\d ', 1, 5)) nulls first
which will order 3,A3,A3.1,B3
.
You might not need the string
at the end now.
CodePudding user response:
What you mainly need is an idea how to approach this. As a human I can easily see what you are sorting these strings by: 1,2,A3,A3.1, A3.1.1. B3,B3.1,4.1, 4.1.1, 4.1.2..., but a computer wouldn't know that an A3 is to be treated as 3 plus a little and B3 as 3 plus a little more. And as these are strings, the computer won't understand either why '2' is to be sorted before '10' when it comes to numbers with more than one digit, because the string '10' starts with a '1', while the string '2' starts with a '2', which comes after '1'.
Let's start with the first problem. The computer doesn't see '3' < 'A3' < 'B3' < '4', but it would see '3' < '3A' < '3B' < '4'. So why not switch positions:
REGEXP_REPLACE(col, '([[:alpha:]])([[:digit:]] )', '\2\1')
Then we must address the second problem. The computer doesn't see '1' < '2' < '10', but it would see '01' < '02' < '10'. So, optimally we would precede all numbers with leading zeros, so they all have the same length. For this to be possible, we must know the longest length allowed for a number. Let's say it is three places, like in '123' and 'B123'. Then we must make all numbers three digit numbers ('012', '012A', '123', '123B'). Unfortunately regular expressions cannot do that (or I don't know how). I'd have to kind of hard code: "find one-digit numbers and make them three-digit numbers" and "find two-digit numbers and make them three-digit numbers". And still I'd have to trick myself through this by adding unnecessary zeros first and then remove them again.
Here is the whole thing combined:
select
col,
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(col,
'([[:alpha:]])([[:digit:]] )',
'\2\1' -- step 1: switch number / alpha positions
),
'([[:digit:]] )',
'00\1' -- step 2: precede all numbers with two zeros
),
'0([[:digit:]]{4,4})',
'\1' -- step 3: remove one leading zero from five-digit numbers
),
'0([[:digit:]]{3,3})',
'\1' -- step 4: remove one leading zero from (now) four-digit numbers
) as sortkey
from mytable
order by sortkey;
Demo: https://dbfiddle.uk/?rdbms=oracle_21&fiddle=dd21d406a88846285fed1d3c05595417
Or write a PL/SQL function to generate the sortkey.