Home > Software engineering >  Filtering SQL rows based on certain alphabets combination
Filtering SQL rows based on certain alphabets combination

Time:01-10

I have a column that store user input text field from a frontend website. User can input any kind of text in it, but they will also put in a specific alphabets combination to represent a job type - for example 'dri'. As an example:

Row 1: P49384; Open vehicle bonnet-BO-dri 22/10
Row 2: P93818; Vehicle exhaust-BO 10/20
Row 3: P1933; battery dri-pu-103/2
Row 4: P3193; screwdriver-pu 423
Row 5: X939; seats bo
Row 6: P9381-vehicle-pu-bo dri

In this case, I will like to filter only rows that contain dri. From the example, you can see the text can be in any order (user behaviour, they will key whatever they like without following any kind of format). But the constant is that for a particular job type, they will put in dri.

I know that I can simply use LIKE in SQL Server to get these rows. Unfortunately, row 4 is included inside when I use this operator. This is because screwdriver contains dri.

Is there any way in SQL Server I can do to strictly only obtain rows that has dri job type, while excluding words like screwdriver?

I tried to use PATINDEX but it failed too - PATINDEX('%[d][r][i]%', column) > 0

Thanks in advance.

CodePudding user response:

Your data is the problem here. Unfortunately even for denormalised data it doesn't appear to have a reliable/defined format, making parsing your data in a language like T-SQL next to impossible. What problems are there? Based on the original sample data, at a glance the following problems exist:

  • The first data value's delimiter isn't consistent. Rows 1-5 use a semicolon (;), but row 6 uses a hyphen (-)
  • The last data value's delimiter isn't consistent. Row 1, 2 & 4 use a space ( ), but row 3 uses a hyphen (-).
  • Internal data doesn't use a consistent delimiter. For example:
    • Row 1 has a the value Open vehicle bonnet-BO-dri, which appears to be the values Open vehicle bonnet, BO and dri; so the hyphen(-) is the delimiter.
    • Row 5 has seats bo, which appears to be the values seats and bo, so uses a space ( ) as a delimiter.
    • The fact that row 6 has vehicle as its own value (vehicle-pu-bo-dri), however, implies that Open vehicle bonnet and Vehicle Exhaust (on rows 1 and 2 respectively) could actually be the values Open, vehicle, & bonnet and Vehicle & Exhaust respectively.

Honestly, the solution is to fix your design. As such, your tables should likely look something like this:

CREATE TABLE dbo.Job (JobID varchar(6) CONSTRAINT PK_JobID PRIMARY KEY NONCLUSTERED, --NONCLUSTERED Because it's not always ascending
                      YourNumericalLikeValue varchar(5) NULL); --Obviously use a better name

CREATE TABLE dbo.JobTypeCompleted(JobTypeID int IDENTITY (1,1) CONSTRAINT PK_JobTypeID PRIMARY KEY CLUSTERED,
                                  JobID varchar(6) NOT NULL CONSTRAINT FK_JobType_Job FOREIGN KEY REFERENCES dbo.Job (JobID),
                                  JobType varchar(30) NOT NULL); --Must likely this'll actually be a foreign key to an actual job type table

GO

Then, for a couple of your rows, the data would be inserted like so:

INSERT INTO dbo.Job (JobID, YourNumericalLikeValue)
VALUES('P49384','22/10'),
      ('P9381',NULL);
GO

INSERT INTO dbo.JobTypeCompleted(JobID,JobType)
VALUES('P49384','Open vehicle bonnet'),
      ('P49384','BO'),
      ('P49384','dri'),
      ('P9381','vehicle'),
      ('P9381','pu'),
      ('P9381','bo'),
      ('P9381','dri');

Then you can easily get the jobs you want with a simple query:

SELECT J.JobID,
       J.YourNumericalLikeValue
FROM dbo.Job J
WHERE EXISTS (SELECT 1
              FROM dbo.JobTypeCompleted JTC
              WHERE JTC.JobID = J.JobID
                AND JTC.JobType = 'dri');

CodePudding user response:

You can apply like operator in your query as column_name like '%-dri'. It means find out records that end with "-dri"

  • Related