Home > other >  How to extract values from a free-text varchar value as separate records in SQL
How to extract values from a free-text varchar value as separate records in SQL

Time:01-17

In SQL Server, what is the best way to extract values as records from a table with free-text varchar values?

I've got a table with a varchar column, which contains free text including some lab values. So it contains unstructured text, but luckily the lab values are recognisable by brackets, for example [NA/bl: 137].

For each LabId I want to extract the lab values into a new separate table for further processing. The problem is that the number of lab values in the varchar text is varying, for some records it is 3 values, for others 5 values, sometimes no values at all.

I've created a representative data set to reproduce the problem, see the example below

CREATE TABLE #TestTable123(
    [_record_number] int IDENTITY(1,1) PRIMARY KEY,
    [LabId] Integer,
    [LabDate] datetime,
    LabDescription varchar(200)
);

insert into #TestTable123(
    LabId,
    LabDate,
    LabDescription
) values
('1001', '2022-02-13', 'questionnaire completed, labvalues [NA/bl: 141] [HCT/blo: 0.39] [HGB: 8.2] [WBC: 7.0], cardiotest completed'),
('1002', '2021-04-10', 'noshow'),
('1003', '2021-10-18', 'questionnaire completed, lab [NA/bl: 138] [HCT/blo: 0.29] [HGB: 4.7]'),
('1004', '2022-06-07', 'labresults [NA/bl: 140] [HCT/blo: 0.31] [HGB: 5.5] [WBC: 3.2], questionnaire completed'),
('1005', '2021-11-26', 'lab [NA/bl: 136] [HCT/blo: 0.38] [HGB: 6.8]')

I tried the following SQL but this only results in the first lab result within each varchar text. So in the example data this is only the NA/bl values.

select
    LabId,
    substring(LabDescription, charindex('[', LabDescription), charindex(']', LabDescription)-charindex('[', LabDescription)   1)
from
    #TestTable123
where
    LabDescription like '%\[%' escape '\'

What I'm looking for, is the following result dataset

LabId  LabResult_extracted
1001   [NA/bl: 141]
1001   [HCT/blo: 0.39]
1001   [HGB: 8.2]
1001   [WBC: 7.0]
1003   [NA/bl: 138]
1003   [HCT/blo: 0.29]
1003   [HGB: 4.7]
1004   [NA/bl: 140]
1004   [HCT/blo: 0.31]
1004   [HGB: 5.5]
1004   [WBC: 3.2]
1005   [NA/bl: 140]
1005   [HCT/blo: 0.38]
1005   [HGB: 6.8]

What is the best way to get this result?

CodePudding user response:

What is the best way to get this result?

INSERT the data in a normalised form in the first place. Have 2 (or more tables) with a normalised design, and have a n to many relationship for your data. It seems like you have 3 different pieces of information in that column, and (some of) those values can also represent multiple values. Take the value 'questionnaire completed, labvalues [NA/bl: 141] [HCT/blo: 0.39] [HGB: 8.2] [WBC: 7.0], cardiotest completed', for example. You have the strings questionnaire completed and labvalues. Then you have multiple lab results, and then a further cardiotest completed value. This should be at least 3 columns, and probably at least 2 extra tables.

For the values you want, you can do this is SQL Server, but it's not ideal. If your plan, in the future, is to be able to search for rows with specific values (like '[HGB: 5.5]') then normalisation is a must.

For the sample data, and expected results you've given, however, this is one method, but it's not the "best" way:

SELECT TT123.LabId,
       TRIM(SS.[value])   ']' AS Description
FROM #TestTable123 TT123
     CROSS APPLY (VALUES(STUFF(TT123.LabDescription,1,NULLIF(CHARINDEX('[',TT123.LabDescription),0)-1,'')))V(Descriptions)
     CROSS APPLY STRING_SPLIT(V.Descriptions,']') SS
WHERE TRIM(SS.[value]) LIKE '\[%' ESCAPE '\';

CodePudding user response:

Another option is with the use of a helper function

Now, this function strips the [], but it would be a small matter to add them back if desired

Example

Select A.LabID 
      ,B.*
 From #TestTable123 A
 Cross Apply [dbo].[tvf-Str-Extract-JSON](LabDescription,'[',']') B

Results

enter image description here

The Table-Valued Function if Interestes

CREATE FUNCTION [dbo].[tvf-Str-Extract-JSON] (@String nvarchar(max),@Delim1 nvarchar(100),@Delim2 nvarchar(100))
Returns Table 
As
Return (  

    Select RetSeq = row_number() over (order by RetSeq)
          ,RetVal = left(RetVal,charindex(@Delim2,RetVal)-1)
    From  (
            Select RetSeq = [Key] 1
                  ,RetVal = trim(Value)
             From  OpenJSON( N'["' replace(string_escape(@String,'json'),@Delim1,'","') N'"]' )

          ) C1
    Where charindex(@Delim2,RetVal)>1

)

EDIT: If you don't want the TVF

Select A.LabID 
      ,B.*
 From #TestTable123 A
 Cross Apply (
                Select RetSeq = row_number() over (order by RetSeq)
                      ,RetVal = left(RetVal,charindex(']',RetVal)-1)
                From  (
                        Select RetSeq = [Key] 1
                              ,RetVal = trim(Value)
                         From  OpenJSON( N'["' replace(string_escape(LabDescription,'json'),'[','","') N'"]' )
                      ) C1
                Where charindex(']',RetVal)>1
 ) B
  • Related