Home > Software design >  In SQL, how do I find first non-matching character between two strings
In SQL, how do I find first non-matching character between two strings

Time:09-02

I am using MariaDB to store hierarchical items whose key is a string of identical length.

It is the key that encodes the relationships (parent, child, ancestor, descendant, none) as follows.

  • The leftmost non-zero character-substring of the key identifies an item, with the rest of the string a pad of zeros.
  • If the two keys of any two items A and B differ at the first character, A and B are not related (they don't have a common ancestor)
  • otherwise A and B are related with A being the ancestor of B if the leftmost non-zero character-substring of A is a substring of the leftmost non-zero character-substring of B. In other words if the position of the leftmost zero character in the key for A is less than the position of the leftmost zero character in the key for B i.e. if the pad of zeros in A is greater in length than the pad of zeros in B.

I would like to write an SQL query code that takes a string (a possible key) and returns the ancestors keys of this string. I am able to write the version that returns the descendants as follows. I first remove the zero-character pad on the right of the candidate string then issue the following...

SELECT keys FROM items WHERE LEFT( key, CHAR_LENGTH(string) ) = string

but writing one that returns the ancestors is a headache. Please help me.

CodePudding user response:

It sounds like you're using a variation of the hierarchical data solution called "materialized path."

You can find keys that have string as the last part of the path, followed by zeroes:

SELECT key FROM items 
WHERE key REGEXP CONCAT(string, '0*$')

CodePudding user response:

Than you Bill Karwin, you are correct, this is a very restricted variation of the materialized path technique. The restriction is that the path cannot grow 'infinitely'. The tree the keys encode has a fixed maximum height.

The identity of an item here (the path) is the substring of its key from the first character to the character that comes before the first zero character. For example for 7-character-wide 'path' that encodes a tree of maximum height 7

item    key (path)
 A      qwer000
 B      qewr000
 C      qw00000
 D      dqwe000

The first three (A, B and C) share the common ancestor 'q' but none of them is related to D. The item C is an ancestor of A because its identity (path) is a substring of the identity (path) of A.

With SQL you tell if a string is a substring of another using LOCATE, POSITION, INSTR.

How do you get the substrings from the path i.e.

the 'qw' from 'qw00000' 

and

the 'qwer' from 'qwer000'

so as to issue say

LOCATE ('qw', 'qwer')

You use the SQL function SUBSTRING_INDEX as

SUBSTRING_INDEX (path, '0', 1)

So to tell if a key strP is an ancestor of a key strQ would be

LOCATE ( SUBSTRING_INDEX (strP, '0', 1), SUBSTRING_INDEX (strQ, '0', 1) )

If this returns a value greater than zero, strP is an ancestor of strQ

  • Related