Home > Software engineering >  sql server matching 2 table columns for an exact word match
sql server matching 2 table columns for an exact word match

Time:05-03

I want to search a table's varchar column for content in another table's varchar column.

Certain words are banned and I want to identify the rows that have the banned words. I want an EXACT match on the banned word.

I'm using MS SQL Server 2014.

Table 1:

CREATE TABLE [dbo].[BlogComment](
    [BlogCommentId] [int] IDENTITY(1,1) NOT NULL,
    [BlogCommentContent] [varchar](max) NOT NULL,
CONSTRAINT [PK_BlogComment] PRIMARY KEY CLUSTERED 
(
    [BlogCommentId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = 
ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

3 rows - and the data in BlogCommentContent:

There are many of us.

This is the man.

I hear you.

Table 2:

CREATE TABLE [dbo].[BannedWords](
    [BannedWordsId] [int] IDENTITY(1,1) NOT NULL,
    [Description] [varchar](250) NOT NULL
CONSTRAINT [PK_BannedWords] PRIMARY KEY CLUSTERED 
(
    [BannedWordsId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = 
ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

3 rows - and the data in Description:

though

man

hear


My Sql:

SELECT BlogCommentContent
FROM dbo.BlogComment,
     dbo.BannedWords
WHERE ( CHARINDEX( [Description], BlogCommentContent, 1 ) ) > 1

It's finding 'man', 'hear' and 'man' in the word 'many'. So it returns 3 rows.

I only WANT EXACT matches. So only return 2 rows.

How do I accomplish this?

CodePudding user response:

Please try the following solution.

It will work starting from SQL Server 2017 onwards.

SQL

-- DDL and sample data population, start
DECLARE @BlogComment TABLE (
    BlogCommentId INT IDENTITY PRIMARY KEY,
    BlogCommentContent VARCHAR(MAX));
INSERT INTO @BlogComment (BlogCommentContent) VALUES
('There are many of us.'),
('This is the man.'),
('I hear you.');

DECLARE @BannedWords TABLE (
    BannedWordsId INT IDENTITY PRIMARY KEY,
    Word varchar(250))
INSERT INTO @BannedWords (Word) VALUES
('though'),
('man'),
('hear');
-- DDL and sample data population, end

;WITH rs AS
(
    SELECT word = TRIM('.,' FROM value ) 
    FROM @BlogComment 
    CROSS APPLY STRING_SPLIT(BlogCommentContent, SPACE(1))
)
SELECT DISTINCT bw.Word 
FROM rs
    INNER JOIN @BannedWords bw ON rs.word = bw.Word;  

SQL Server 2014

;WITH rs AS
(
    --SELECT word = TRIM('.,' FROM [value]) 
    SELECT word = REPLACE(REPLACE([value],'.',''),',','')
    FROM @BlogComment 
    CROSS APPLY STRING_SPLIT(BlogCommentContent, SPACE(1))
)
SELECT DISTINCT bw.Word 
FROM rs
    INNER JOIN @BannedWords bw ON rs.word = bw.Word; 

Output

 ------ 
| Word |
 ------ 
| man  |
| hear |
 ------ 

CodePudding user response:

If you want exact matches what you mean is that there must not be another work touching so man can match "man", "man and women", "mice and men", "a man or two". You need to check

BlogCommentContent = 'man'
left(BlogCommentContent,3) = 'man'
right(BlogCommentContent,3) = 'man'
BlogCommentContent like ' man ' 

the length of man can be found with len('man') to be used in right() and left().
The last value, for like, can be constructed with concat(' ','man',' ')

  • Related