I have a column in SQL Server which contains a characters and a random numbers or a dash and random character string. I want to split the character part. Does anyone have any suggestion?
create table stage.test(
geo_area varchar(50) null
)
insert into stage.test values ('mobile-pensacola (ft walt) (686)')
insert into stage.test values ('rj-globo rio (76008)' )
insert into stage.test values ('ce2-tv (6666)' )
insert into stage.test values ('mumbai metropolitan region (356002)')
Tried this query which is not working as expected
select left(geo_area, len(left(geo_area, 50) ) - 6) as geo_area
from stage.test
Expected output
mobile-pensacola (ft walt)
rj-globo rio
ce2-tv
mumbai metropolitan region
CodePudding user response:
Here is a more generic solution based on tokenization.
It will remove islands of digits surrounded by any delimiter, parenthesis or otherwise, regardless of their location. I added one extra row to demonstrate it.
SQL
-- DDL and sample data population, start
DECLARE @tbl table (ID INT IDENTITY PRIMARY KEY, geo_area varchar(50) NULL);
INSERT INTO @tbl (geo_area) VALUES
('mobile-pensacola (ft walt) (686)'),
('rj-globo rio (76008)'),
('ce2-tv (6666)'),
('mumbai metropolitan region (356002)'),
('mumbai (356002) metropolitan region');
-- DDL and sample data population, end
DECLARE @separator CHAR(1) = SPACE(1);
SELECT t.*
, c.query('
for $x in /root/r/text()
return if (xs:int(substring($x,2,string-length($x) - 2)) instance of xs:int) then ()
else data($x)
').value('.', 'VARCHAR(MAX)') AS Result
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['
REPLACE(geo_area, @separator, ']]></r><r><![CDATA[')
']]></r></root>' AS XML)) AS t1(c);
Output
---- ------------------------------------- ----------------------------
| ID | geo_area | Result |
---- ------------------------------------- ----------------------------
| 1 | mobile-pensacola (ft walt) (686) | mobile-pensacola (ft walt) |
| 2 | rj-globo rio (76008) | rj-globo rio |
| 3 | ce2-tv (6666) | ce2-tv |
| 4 | mumbai metropolitan region (356002) | mumbai metropolitan region |
| 5 | mumbai (356002) metropolitan region | mumbai metropolitan region |
---- ------------------------------------- ----------------------------
CodePudding user response:
You could use patindex()
to find the first occurence of the sections to remove:
with sampledata as (
select * from (values
('mobile-pensacola (ft walt) (686)'),
(' rj-globo rio (76008)'),
('vz-jp (6666)'),
('mumbai metropolitan region (356002)')
)x(col)
)
select trim(Left(col, PatIndex('%([0-9]%', col)-1)) NewCol
from sampledata;