Home > database >  python sqlite string comparison
python sqlite string comparison

Time:12-05

I have got an sqlite database with multiple rows in a table. One of the colums holds strings with an arbitrary amount of positive integers, so:

1: '### ## # # # ## # ##'
2: '# ## # ## # ##'
3: '# # # ## ## ### #'
...

I have also gotten a comparison string in my python code that also looks like '## # ### #' The comparison string also holds an arbitrary amount of positive integers.

Now if we take a database string as being 'a b c d e f' and our comparison string as 'g h i j k' for example, I need to test if any 3 consecutive numbers in my comparison string(so 'g h i' or 'h i j' or 'i j k') are also consecutive numbers in the database. So basically I need to test for each row in the database if

'g h i' == 'a b c' or 'g h i' == 'b c d' or ...

Is that doable with the way my data is currently saved in the database or do I need to save my data any other way. Either way, how can I do this with SQL?

CodePudding user response:

I would propose a different design for your table.

Instead of storing the numbers like this:

grp value
1 '10 100 20 5 70'
2 '100 20 5 35 3 15'

you could normalize the table so that each row contains only 1 value:

id grp value
1 1 10
2 1 100
3 1 20
4 1 5
5 1 70
6 2 100
7 2 20
8 2 5
9 2 35
10 2 3
11 2 15

This way it is easier to concatenate consecutive numbers (in triplets or anything else) so that you can check if the concatenated numbers are contained inside your string.

Create the table:

CREATE TABLE tablename(id INTEGER PRIMARY KEY AUTOINCREMENT, grp INTEGER, value INTEGER);

and for each row use window function LEAD() to get the next number and the one after that so that you create a concatenated triplet.
Then you can use the operator LIKE to check if the triplet is there in your string:

WITH cte AS (
  SELECT *, 
          value || ' ' ||
          LEAD(value, 1) OVER (PARTITION BY grp ORDER BY id) || ' ' || 
          LEAD(value, 2) OVER (PARTITION BY grp ORDER BY id) str
  FROM tablename
)
SELECT grp, MAX(' ' || ? || ' ' LIKE '% ' || str || ' %') flag
FROM cte 
GROUP BY grp;

Or, a better scalable solution:

WITH cte AS (
  SELECT *, 
         CASE WHEN ROW_NUMBER() OVER (PARTITION BY grp ORDER BY id) > 2
           THEN GROUP_CONCAT(value, ' ') OVER (
                    PARTITION BY grp ORDER BY id 
                    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
                )
         END str
  FROM tablename
)
SELECT grp, MAX(' ' || '6 100 20 5 12 19' || ' ' LIKE '% ' || str || ' %') flag
FROM cte 
GROUP BY grp;

Replace ? with your string.

See a simplified demo.

CodePudding user response:

If you can convert both strings into 'lists' of values, you can use set operations like INTERSECT to solve this.

First set up 2 tables - data is your existing data, while search is your string for comparison - with lists of data:

CREATE TABLE data(v1 int);
CREATE TABLE search(v2 int);
INSERT INTO data(v1) VALUES(123),(456),(789);
INSERT INTO search(v2) VALUES(456),(987);

Now you can compare the 2 tables, and get the intersection - i.e all items which are in both data and search:

SELECT v1 FROM data INTERSECT SELECT v2 FROM search;
# 456

Or if you just want to see if there are any matches or not, you can wrap this with EXISTS:

SELECT EXISTS (SELECT v1 FROM data INTERSECT SELECT v2 FROM search);
# 1 if there are matches, 0 otherwise
  • Related