Home > Back-end >  OBJECT_DEFINITION (Transact-SQL) - trim trailing and leading empty lines from nvarchar(max) result
OBJECT_DEFINITION (Transact-SQL) - trim trailing and leading empty lines from nvarchar(max) result

Time:07-28

The SQL Server function OBJECT_DEFINITION (Transact-SQL) returns a nvarchar(max)

For example this SQL:

select 
    name,
    --modify_date,
    object_definition(object_id) as SQLText
 from sys.procedures 
where schema_id = 1

I am wondering how to work with the nvarchar(max) result? How would I trim trailing and leading empty lines for example?

TRIM doesn't work on varchar(max) or nvarchar(max) according to the docs

CodePudding user response:

The following trims tabs, spaces, new lines, and carriage returns and will work with MAX datatype.

DECLARE @TextToTrim NVARCHAR(MAX) = '


    Some Text


'

DECLARE @Pattern NVARCHAR(100) =  '%[^'   CHAR(13)   CHAR(10)   CHAR(32)   CHAR(9)   ']%';

PRINT 
'~~~'   
SUBSTRING(@TextToTrim, 
            PATINDEX(@Pattern, @TextToTrim   'x'), 
            LEN(@TextToTrim   'x') -PATINDEX(@Pattern, REVERSE(@TextToTrim)) -  PATINDEX(@Pattern, @TextToTrim   'x')   1)
  '~~~'         
            ;

Returns

~~~Some Text~~~

CodePudding user response:

SQL in general though, and T-SQL in particular, are very weak when it comes to string manipulation. TRIM doesn't work on LOB types like varchar(max) and nvarchar(max) while RTRIM doesn't allow you to specify the character to trim. It's better to use another language like C#, Python etc to process strings on the client.

If you want to keep things in the database, SQL Server 2017 allows you to call Python scripts on a query result using sp_execute_external_script. Python is very strong in string manipulation and the distribution that comes with SQL Server is geared towards data science and includes powerful text processing packages, even Natural Language Processing packages.

You can trim whitespace (including newlines) simply by calling strip :

declare @sql nvarchar(max) =N'
select 
    name,
    object_definition(object_id) as SqlText
 from sys.procedures 
where schema_id = 1'

declare @script nvarchar(max)=N'
OutputDataSet=InputDataSet
OutputDataSet["SqlText"]=OutputDataSet["SqlText"].str.strip()
'
exec sp_execute_external_script  @language = N'Python'
    , @script =@script
    , @input_data_1=@sql
  • InputDataSet is a Pandas dataframe containing the input query's results.
  • OutputDataSet is a dataframe containing the output of the script.
  • OutputDataSet["SqlText"].str.strip() applies the string function strip to the entire "column" (series) SqlText and returns a new column.
  • Finally, OutputDataSet["SqlText"]=... adds the new column to OutputDataSet with the specified column name. If the column already exists, it's replaced.

The same result can be achieved with a single line:

declare @script nvarchar(max)=N'
OutputDataSet=InputDataSet.assign(SqlText=InputDataSet["SqlText"].str.strip())
'

The results of the SQL query would be

create procedure dbo.sp_MScleanupmergepublisher
as
    exec sys.sp_MScleanupmergepublisher_internal

create procedure dbo.sp_MSrepl_startup
as
    exec sys.sp_MSrepl_startup_internal

The results after executing the Python script are :

create procedure dbo.sp_MScleanupmergepublisher
as
    exec sys.sp_MScleanupmergepublisher_internal
create procedure dbo.sp_MSrepl_startup
as
    exec sys.sp_MSrepl_startup_internal

Pandas (and Python) offer far more powerful string processing functions including padding, zero-filling, slicing, slitting into arrays, regular expressions, multiple types of case conversion including Title etc. Many of these functions may be trivial in C# or Python but very complex to implement in T-SQL.

For example, replace can be used to strip all whitespace from the end of a column with :

OutputDataSet["SqlText"]=OutputDataSet["SqlText"].str.replace("\s*$",""))
  • Related