The SQL Server function OBJECT_DEFINITION (Transact-SQL)
returns a
nvarchar(max)
For example this SQL:
select
name,
--modify_date,
object_definition(object_id) as SQLText
from sys.procedures
where schema_id = 1
I am wondering how to work with the nvarchar(max)
result?
How would I trim trailing and leading empty lines for example?
TRIM doesn't work on varchar(max) or nvarchar(max) according to the docs
CodePudding user response:
The following trims tabs, spaces, new lines, and carriage returns and will work with MAX
datatype.
DECLARE @TextToTrim NVARCHAR(MAX) = '
Some Text
'
DECLARE @Pattern NVARCHAR(100) = '%[^' CHAR(13) CHAR(10) CHAR(32) CHAR(9) ']%';
PRINT
'~~~'
SUBSTRING(@TextToTrim,
PATINDEX(@Pattern, @TextToTrim 'x'),
LEN(@TextToTrim 'x') -PATINDEX(@Pattern, REVERSE(@TextToTrim)) - PATINDEX(@Pattern, @TextToTrim 'x') 1)
'~~~'
;
Returns
~~~Some Text~~~
CodePudding user response:
SQL in general though, and T-SQL in particular, are very weak when it comes to string manipulation. TRIM
doesn't work on LOB types like varchar(max)
and nvarchar(max)
while RTRIM doesn't allow you to specify the character to trim. It's better to use another language like C#, Python etc to process strings on the client.
If you want to keep things in the database, SQL Server 2017 allows you to call Python scripts on a query result using sp_execute_external_script
. Python is very strong in string manipulation and the distribution that comes with SQL Server is geared towards data science and includes powerful text processing packages, even Natural Language Processing packages.
You can trim whitespace (including newlines) simply by calling strip :
declare @sql nvarchar(max) =N'
select
name,
object_definition(object_id) as SqlText
from sys.procedures
where schema_id = 1'
declare @script nvarchar(max)=N'
OutputDataSet=InputDataSet
OutputDataSet["SqlText"]=OutputDataSet["SqlText"].str.strip()
'
exec sp_execute_external_script @language = N'Python'
, @script =@script
, @input_data_1=@sql
InputDataSet
is a Pandas dataframe containing the input query's results.OutputDataSet
is a dataframe containing the output of the script.OutputDataSet["SqlText"].str.strip()
applies the string functionstrip
to the entire "column" (series)SqlText
and returns a new column.- Finally,
OutputDataSet["SqlText"]=...
adds the new column toOutputDataSet
with the specified column name. If the column already exists, it's replaced.
The same result can be achieved with a single line:
declare @script nvarchar(max)=N'
OutputDataSet=InputDataSet.assign(SqlText=InputDataSet["SqlText"].str.strip())
'
The results of the SQL query would be
create procedure dbo.sp_MScleanupmergepublisher
as
exec sys.sp_MScleanupmergepublisher_internal
create procedure dbo.sp_MSrepl_startup
as
exec sys.sp_MSrepl_startup_internal
The results after executing the Python script are :
create procedure dbo.sp_MScleanupmergepublisher
as
exec sys.sp_MScleanupmergepublisher_internal
create procedure dbo.sp_MSrepl_startup
as
exec sys.sp_MSrepl_startup_internal
Pandas (and Python) offer far more powerful string processing functions including padding, zero-filling, slicing, slitting into arrays, regular expressions, multiple types of case conversion including Title etc. Many of these functions may be trivial in C# or Python but very complex to implement in T-SQL.
For example, replace
can be used to strip all whitespace from the end of a column with :
OutputDataSet["SqlText"]=OutputDataSet["SqlText"].str.replace("\s*$",""))