Home > Back-end >  String intersection using EF Core, Linq / SQL Server
String intersection using EF Core, Linq / SQL Server

Time:07-31

I'm trying to find if there is any match of

  • array of input strings
  • with comma separated strings

stored inside SQL Server:

class Meeting
{
    public int Id { get; set; }
    public string? MeetingName { get; set; }
    public string DbColumnCommaSeparated { get; set; }
}

meetingQuery.Where(x => ArrayOfInputString
  .Any(y => x.DbColumnCommaSeparated.Split(",").Contains(y)))

Is it feasible to do it in an EF Core query using DbFunctions's and SQL STRING_SPLIT?

CodePudding user response:

ANSWER 1: SQL CLR Approach
STEP 1: Test SQL Schema

create table Meeting
(
    Id int identity(1,1) primary key,
    MeetingName nvarchar(max) null,
    DbColumnCommaSeparated nvarchar(max) not null
)
go
truncate table Meeting
insert into Meeting
values('one','1,2,3,4');
insert into Meeting
values('two','5,6,7,8');
insert into Meeting
values('three','1,2,7,8');
insert into Meeting
values('four','11,22,73,84');
insert into Meeting
values('five','14,25,76,87');

STEP 2: Write SQL CLR Function read more

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Data.Linq;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction]
    public static SqlBoolean FilterCSVFunction(string source, string item)
    {
        return new SqlBoolean(Array.Exists(source.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries), i => i == item));
    }
}

STEP 3: Enable SQL CLR for database

EXEC sp_configure 'clr enabled', 1;  
RECONFIGURE;  
EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;
EXEC sp_configure 'clr strict security', 0;
RECONFIGURE;

STEP 4: Actual Query

DECLARE @y nvarchar(max)
SET @y = '7'
SELECT * FROM Meeting
WHERE dbo.FilterCSVFunction(DbColumnCommaSeparated, @y) = 1

STEP 5: You can import the function in ef

ANSWER 2 Updated

DECLARE @ArrayOfInputString TABLE(y INT);
INSERT INTO @ArrayOfInputString(y) VALUES(7);
INSERT INTO @ArrayOfInputString(y) VALUES(8);

SELECT DISTINCT M1.* FROM Meeting M1 JOIN
(
    SELECT ID, Split.a.value('.', 'VARCHAR(100)') As Data FROM 
    (SELECT M2.ID, CAST ('<M>'   REPLACE(DbColumnCommaSeparated, ',', '</M><M>')   '</M>' AS XML) as DbXml FROM Meeting M2) A
    CROSS APPLY A.DbXml.nodes ('/M') AS Split(a)
) F ON M1.ID = F.ID
WHERE F.Data IN (SELECT y FROM @ArrayOfInputString)

CodePudding user response:

It's actually pretty easy thanks to of EF-core's smooth support for mapping database functions. In this case we need a table-valued function (TVF) that can be called in C# code.

It starts with adding a TVF to the database, which, when using migrations, requires an addition to migration code:

ALTER FUNCTION [dbo].[SplitCsv] (@string nvarchar(max))
RETURNS TABLE
AS
    RETURN SELECT ROW_NUMBER() OVER (ORDER BY Item) AS ID, Item
    FROM (SELECT Item = [value] FROM string_split(@string, ',')) AS items

This TVF is mapped to a simple class in EF's class model:

class CsvItem
{
    public long ID { get; set; }
    public string? Item { get; set; }
}

For this mapping to succeed the SQL function always returns unique ID values.

Then a method, only to be used in expressions, is added to the context:

public IQueryable<CsvItem> SplitCsv(string csv) => FromExpression(() => SplitCsv(csv));

...and added to the model:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.HasDbFunction(typeof(MyContext)
        .GetMethod(nameof(SplitCsv), new[] { typeof(string) })!);
}

That's all! Now you can execute queries like these (in Linqpad, db is a context instance):

db.Meetings.Where(m => db.SplitCsv(m.DbColumnCommaSeparated).Any(i => i.Item == "c")).Dump();

var csv = "a,d,d";
db.Meetings.Where(m => db.SplitCsv(m.DbColumnCommaSeparated)
    .Intersect(db.SplitCsv(csv)).Any()).Dump();

Generated SQL:

  SELECT [m].[Id], [m].[DbColumnCommaSeparated], [m].[MeetingName]
  FROM [Meetings] AS [m]
  WHERE EXISTS (
      SELECT 1
      FROM [dbo].[SplitCsv]([m].[DbColumnCommaSeparated]) AS [s]
      WHERE [s].[Item] = N'c')

  SELECT [m].[Id], [m].[DbColumnCommaSeparated], [m].[MeetingName]
  FROM [Meetings] AS [m]
  WHERE EXISTS (
      SELECT 1
      FROM (
          SELECT [s].[ID], [s].[Item]
          FROM [dbo].[SplitCsv]([m].[DbColumnCommaSeparated]) AS [s]
          INTERSECT
          SELECT [s0].[ID], [s0].[Item]
          FROM [dbo].[SplitCsv](@__csv_1) AS [s0]
      ) AS [t])

One remark. Although it can be done, I wouldn't promote it. It's usually much better to store such csv values as a table in the database. It's much easier in maintenance (data integrity!) and querying.

CodePudding user response:

EF Core has FromSqlRaw why not using it ?

here is an extension method that would work with DbSet<T> :

public static class EntityFrameworkExtensions
{
    private const string _stringSplitTemplate = "SELECT * FROM {0} t1 WHERE EXISTS (SELECT 1 FROM STRING_SPLIT(t1.{1}, ',') s WHERE s.[value] IN({2}));";

    public static IQueryable<TEntity> StringSplit<TEntity, TValue>(this DbSet<TEntity> entity, Expression<Func<TEntity, TValue>> keySelector, IEnumerable<TValue> values) 
        where TEntity : class
    {         
        var columnName = (keySelector.Body as MemberExpression)?.Member?.Name;

        if (columnName == null) return entity;

        var queryString = string.Format(_stringSplitTemplate, entity.EntityType.GetTableName(), columnName, string.Join(',', values));

        return entity.FromSqlRaw(queryString);
    }
}

usage :

var result = context.Meetings.StringSplit(x=> x.DbColumnCommaSeparated, ArrayOfInputString).ToList();

this would generate the following SQL :

SELECT *
FROM table t1
WHERE EXISTS (
    SELECT 1
    FROM STRING_SPLIT(t1.column, ',') s
    WHERE 
        s.value IN(...)
);
  • Related