Get characters before the numbers in string-CodePudding

I have a column in SQL Server which contains a characters and a random numbers or a dash and random character string. I want to split the character part. Does anyone have any suggestion?

create table stage.test(
    geo_area varchar(50) null
)
    
insert into stage.test values ('mobile-pensacola (ft walt) (686)')
insert into stage.test values ('rj-globo rio (76008)' )
insert into stage.test values ('ce2-tv (6666)' )
insert into stage.test values ('mumbai metropolitan region (356002)')

Tried this query which is not working as expected

 select left(geo_area, len(left(geo_area, 50) ) - 6)  as geo_area
 from stage.test

Expected output

    mobile-pensacola (ft walt)
    rj-globo rio
    ce2-tv
    mumbai metropolitan region

CodePudding user response：

Here is a more generic solution based on tokenization.

It will remove islands of digits surrounded by any delimiter, parenthesis or otherwise, regardless of their location. I added one extra row to demonstrate it.

SQL

-- DDL and sample data population, start
DECLARE @tbl table (ID INT IDENTITY PRIMARY KEY, geo_area varchar(50) NULL);
INSERT INTO @tbl (geo_area) VALUES
('mobile-pensacola (ft walt) (686)'),
('rj-globo rio (76008)'),
('ce2-tv (6666)'),
('mumbai metropolitan region (356002)'),
('mumbai (356002) metropolitan region');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = SPACE(1);

SELECT t.*
    , c.query('
       for $x in /root/r/text()
       return if (xs:int(substring($x,2,string-length($x) - 2)) instance of xs:int) then ()
            else data($x)
       ').value('.', 'VARCHAR(MAX)') AS Result
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['   
        REPLACE(geo_area, @separator, ']]></r><r><![CDATA[')   
        ']]></r></root>' AS XML)) AS t1(c);

Output

 ---- ------------------------------------- ---------------------------- 
| ID |              geo_area               |           Result           |
 ---- ------------------------------------- ---------------------------- 
|  1 | mobile-pensacola (ft walt) (686)    | mobile-pensacola (ft walt) |
|  2 | rj-globo rio (76008)                | rj-globo rio               |
|  3 | ce2-tv (6666)                       | ce2-tv                     |
|  4 | mumbai metropolitan region (356002) | mumbai metropolitan region |
|  5 | mumbai (356002) metropolitan region | mumbai metropolitan region |
 ---- ------------------------------------- ----------------------------

CodePudding user response：

You could use patindex() to find the first occurence of the sections to remove:

with sampledata as (
    select * from (values
        ('mobile-pensacola (ft walt) (686)'),
        ('    rj-globo rio (76008)'),
        ('vz-jp (6666)'),
        ('mumbai metropolitan region (356002)')
    )x(col)
)
select trim(Left(col, PatIndex('%([0-9]%', col)-1)) NewCol
from sampledata;

Example DB<>Fiddle