Home > OS >  Matching strings between columns based on position
Matching strings between columns based on position

Time:11-25

I have a view that aggregates data about customers and shows the products they have access to, along with the status of whether they use those products on a trial basis or not (both as string comma seperated values):

 ---------- ---------- ----------------------- 
| customer | products | products_trial_status |
 ---------- ---------- ----------------------- 
|   234253 | A,B,C    | false,true,false      |
|   923403 | A,C      | true,true             |
|   123483 | B        | true                  |
|   239874 | B,C      | false,false           |
 ---------- ---------- ----------------------- 

and I would like to write a query that returns a list of customers who are using a certain product on a trial.

e.g. I want to see which customers using product B are on a trial, I would get something like this:

 ---------- 
| customer |
 ---------- 
|   234253 | 
|   123483 |
 ---------- 

The only way I can think of doing this is by checking the products column for the position of the product in the string (if it exists there), then checking the corresponding value at the same position in the products_trial_status column and whether it is equal to true.

i.e. for customer 234253, product B is in position 2 (after the first comma), so it's corresponding trial status in the column would also be in position 2 after the first comma there.

How would I go about doing this?

I am aware that storing such data as a string of values is not good practice but it is not something i can change, so would need to work out using the format it is in

CodePudding user response:

You could split the string but it will be quicker to do some (fairly hideous) string manipulations and

  • Replace your comma-delimited true/false string with a non-delimited string of 1s and 0s.
  • Count the number of terms before the B term by counting the number of preceding commas.
  • Return the appropriate substring of your 1/0 list.

Like this:

SELECT customer,
       COALESCE(
         SUBSTR(
           status,
           LENGTH(preceding_terms) - COALESCE(LENGTH(REPLACE(preceding_terms, ',')), 0),
           1
         ),
         '0'
       ) AS hasB
FROM   (
  SELECT customer,
         SUBSTR(','||products, 1, INSTR(','||products||',', ',B,')) AS preceding_terms,
         TRANSLATE(products_trial_status, 'tfrueals,', '10') AS status
  FROM   table_name
)

Which, for the sample data:

CREATE TABLE table_name ( customer, products, products_trial_status ) AS
SELECT 234253, 'A,B,C', 'false,true,false' FROM DUAL UNION ALL
SELECT 923403, 'A,C',   'true,true'        FROM DUAL UNION ALL
SELECT 123483, 'B',     'true'             FROM DUAL UNION ALL
SELECT 239874, 'B,C',   'false,false'      FROM DUAL;

Outputs:

CUSTOMER HASB
234253 1
923403 0
123483 1
239874 0

If you only want the customer numbers then you can add filters:

SELECT customer
FROM   (
  SELECT customer,
         SUBSTR(','||products, 1, INSTR(','||products||',', ',B,')) AS preceding_terms,
         TRANSLATE(products_trial_status, 'tfrueals,', '10') AS status
  FROM   table_name
  WHERE  INSTR(','||products||',', ',B,') > 0
)
WHERE   SUBSTR(
           status,
           LENGTH(preceding_terms) - COALESCE(LENGTH(REPLACE(preceding_terms, ',')), 0),
           1
         ) = '1'

Which outputs:

CUSTOMER
234253
123483

fiddle

CodePudding user response:

You can use hierarchical query in which split the strings by commas and count the number of commas along with regular expression functions such as

WITH t1 AS
(
 SELECT customer,
        REGEXP_SUBSTR(products,'[^,]',1,level) AS products,
        REGEXP_SUBSTR(products_trial_status,'[^,] ',1,level) AS products_ts
   FROM t -- your data source
CONNECT BY level <= REGEXP_COUNT(products,',') 1
    AND PRIOR customer = customer
    AND PRIOR sys_guid() IS NOT NULL  
)
SELECT customer
  FROM t1
 WHERE products = 'B'
   AND products_ts = 'true'    

Demo

  • Related