How to exclude words from strings in tables in the Oracle database using Python?-CodePudding

Oracle database: I am trying to exclude all the words that are in one table (ExceptionWords table) from each row of another table (OriginalWords table), and leave the original values as they are, and overwrite the cleaned ones against the originals corresponding to them into the Clear column. Example below:

ExceptionWords table

id	Word
1	Banana
2	Apple
3	Orange

OriginalWords table

id	Original	Clear
1	London, UK Banana	null
2	Moscow, Russia Banana Apple	null
3	Vine Banana Apple Orange	null
4	London, UK Banana	null

After processing the data from the Original column in the OriginalWords table, the cleared values should appear in the Clear column, from which all words from the ExtractionWords table are excluded.

id	Original	Clear
1	London, UK Banana	London, UK
2	Moscow, Russia Banana Apple	Moscow, Russia
3	Vine Banana Apple Orange	Vine
4	London, UK Banana	London, UK

The problem arose when I used DataFrame to process this data, then when processing the taken lines, it was impossible to return them back to the Clear field and so that they match the originals, since they were mixed, and also for the reason that the OriginalWords table may contain rows with the same values.

Perhaps, it is necessary to use dictionary lists with the values of the originalwords field as a key so that the cleared lines can be inserted into the Clear column in accordance with the originals, but now I can't even imagine how to do it...

CodePudding user response：

As everything is in the database, I'd suggest a stored procedure. Stored where? In the database as well, of course.

For sample data:

SQL> select * From originalwords;

        ID ORIGINAL                    CLEAR
---------- --------------------------- ------------------------------
         1 London, UK Banana
         2 Moscow, Russia Banana Apple
         3 Vine Banana Apple Orange
         4 London UK Banana

SQL> select * from exceptionwords;

        ID WORD
---------- ------
         1 Banana
         2 Apple
         3 Orange

A procedure that uses nested loops: 1st one loops through all original words, while the inner one fetches words that should be replaced.

SQL> create or replace procedure p_clear
  2  is
  3    l_str originalwords.clear%type;
  4  begin
  5    for cur_o in (select o.id, o.original from originalwords o) loop
  6      l_str := cur_o.original;
  7      for cur_e in (select e.id, e.word from exceptionwords e) loop
  8        l_str := replace(l_str, cur_e.word, '');
  9      end loop;
 10      update originalwords o set
 11        o.clear = l_str
 12        where o.id = cur_o.id;
 13    end loop;
 14  end;
 15  /

Procedure created.

Testing:

SQL> exec p_clear;

PL/SQL procedure successfully completed.

SQL> select * from originalwords;

        ID ORIGINAL                    CLEAR
---------- --------------------------- ------------------------------
         1 London, UK Banana           London, UK
         2 Moscow, Russia Banana Apple Moscow, Russia
         3 Vine Banana Apple Orange    Vine
         4 London UK Banana            London UK

SQL>

Looks OK to me.

CodePudding user response：

In Oracle, you can aggregate the exception words into a regular expression and then use REGEXP_REPLACE in a MERGE statement:

MERGE INTO originalWords dst
USING (
  SELECT ROWID As rid,
         TRIM(
           REGEXP_REPLACE(
             Original,
             '\s*('
             || (SELECT LISTAGG(word,'|') WITHIN GROUP (ORDER BY id)
                 FROM   ExceptionWords)
             || ')\s*',
             ' '
           )
         ) AS clear
  FROM   OriginalWords
) src
ON (src.rid = dst.ROWID)
WHEN MATCHED THEN
  UPDATE SET clear = src.clear;

Or, using REPLACE in a recursive subquery:

MERGE INTO originalWords dst
USING (
  WITH indexed_exception_words (rn, num_words, word) AS (
    SELECT ROW_NUMBER() OVER (ORDER BY id),
           COUNT(*) OVER (),
           word
    FROM   ExceptionWords
  ),
  replaced_words (rid, rn, num_words, clear) AS (
    SELECT o.ROWID,
           i.rn,
           i.num_words,
           REPLACE(o.original, i.word)
    FROM   OriginalWords o
           INNER JOIN indexed_exception_words i
           ON (i.rn = 1)
  UNION ALL
    SELECT r.rid,
           i.rn,
           r.num_words,
           REPLACE(r.clear, i.word)
    FROM   replaced_words r
           INNER JOIN indexed_exception_words i
           ON (i.rn = r.rn   1)
  )
  SELECT rid,
         TRIM(clear) As clear
  FROM   replaced_words
  WHERE  rn = num_words
) src
ON (src.rid = dst.ROWID)
WHEN MATCHED THEN
  UPDATE SET clear = src.clear;

Which for your sample data:

CREATE TABLE ExceptionWords (id, Word) AS
SELECT 1, 'Banana' FROM DUAL UNION ALL
SELECT 2, 'Apple' FROM DUAL UNION ALL
SELECT 3, 'Orange' FROM DUAL;

CREATE TABLE OriginalWords (id, Original, Clear) AS
SELECT 1, 'London, UK Banana',           CAST(NULL AS VARCHAR2(30)) FROM DUAL UNION ALL
SELECT 2, 'Moscow, Russia Banana Apple', NULL FROM DUAL UNION ALL
SELECT 3, 'Vine Banana Apple Orange',    NULL FROM DUAL UNION ALL
SELECT 4, 'London, UK Banana',           NULL FROM DUAL;

Then, after either MERGE statement:

SELECT * FROM OriginalWords;

Outputs:

ID ORIGINAL CLEAR

1 London, UK Banana London, UK

2 Moscow, Russia Banana Apple Moscow, Russia

3 Vine Banana Apple Orange Vine

4 London, UK Banana London, UK

db<>fiddle here