Oracle database: I am trying to exclude all the words that are in one table (ExceptionWords table) from each row of another table (OriginalWords table), and leave the original values as they are, and overwrite the cleaned ones against the originals corresponding to them into the Clear column. Example below:
ExceptionWords table
id | Word |
---|---|
1 | Banana |
2 | Apple |
3 | Orange |
OriginalWords table
id | Original | Clear |
---|---|---|
1 | London, UK Banana | null |
2 | Moscow, Russia Banana Apple | null |
3 | Vine Banana Apple Orange | null |
4 | London, UK Banana | null |
After processing the data from the Original column in the OriginalWords table, the cleared values should appear in the Clear column, from which all words from the ExtractionWords table are excluded.
id | Original | Clear |
---|---|---|
1 | London, UK Banana | London, UK |
2 | Moscow, Russia Banana Apple | Moscow, Russia |
3 | Vine Banana Apple Orange | Vine |
4 | London, UK Banana | London, UK |
The problem arose when I used DataFrame to process this data, then when processing the taken lines, it was impossible to return them back to the Clear field and so that they match the originals, since they were mixed, and also for the reason that the OriginalWords table may contain rows with the same values.
Perhaps, it is necessary to use dictionary lists with the values of the originalwords field as a key so that the cleared lines can be inserted into the Clear column in accordance with the originals, but now I can't even imagine how to do it...
CodePudding user response:
As everything is in the database, I'd suggest a stored procedure. Stored where? In the database as well, of course.
For sample data:
SQL> select * From originalwords;
ID ORIGINAL CLEAR
---------- --------------------------- ------------------------------
1 London, UK Banana
2 Moscow, Russia Banana Apple
3 Vine Banana Apple Orange
4 London UK Banana
SQL> select * from exceptionwords;
ID WORD
---------- ------
1 Banana
2 Apple
3 Orange
A procedure that uses nested loops: 1st one loops through all original words, while the inner one fetches words that should be replaced.
SQL> create or replace procedure p_clear
2 is
3 l_str originalwords.clear%type;
4 begin
5 for cur_o in (select o.id, o.original from originalwords o) loop
6 l_str := cur_o.original;
7 for cur_e in (select e.id, e.word from exceptionwords e) loop
8 l_str := replace(l_str, cur_e.word, '');
9 end loop;
10 update originalwords o set
11 o.clear = l_str
12 where o.id = cur_o.id;
13 end loop;
14 end;
15 /
Procedure created.
Testing:
SQL> exec p_clear;
PL/SQL procedure successfully completed.
SQL> select * from originalwords;
ID ORIGINAL CLEAR
---------- --------------------------- ------------------------------
1 London, UK Banana London, UK
2 Moscow, Russia Banana Apple Moscow, Russia
3 Vine Banana Apple Orange Vine
4 London UK Banana London UK
SQL>
Looks OK to me.
CodePudding user response:
In Oracle, you can aggregate the exception words into a regular expression and then use REGEXP_REPLACE
in a MERGE
statement:
MERGE INTO originalWords dst
USING (
SELECT ROWID As rid,
TRIM(
REGEXP_REPLACE(
Original,
'\s*('
|| (SELECT LISTAGG(word,'|') WITHIN GROUP (ORDER BY id)
FROM ExceptionWords)
|| ')\s*',
' '
)
) AS clear
FROM OriginalWords
) src
ON (src.rid = dst.ROWID)
WHEN MATCHED THEN
UPDATE SET clear = src.clear;
Or, using REPLACE
in a recursive subquery:
MERGE INTO originalWords dst
USING (
WITH indexed_exception_words (rn, num_words, word) AS (
SELECT ROW_NUMBER() OVER (ORDER BY id),
COUNT(*) OVER (),
word
FROM ExceptionWords
),
replaced_words (rid, rn, num_words, clear) AS (
SELECT o.ROWID,
i.rn,
i.num_words,
REPLACE(o.original, i.word)
FROM OriginalWords o
INNER JOIN indexed_exception_words i
ON (i.rn = 1)
UNION ALL
SELECT r.rid,
i.rn,
r.num_words,
REPLACE(r.clear, i.word)
FROM replaced_words r
INNER JOIN indexed_exception_words i
ON (i.rn = r.rn 1)
)
SELECT rid,
TRIM(clear) As clear
FROM replaced_words
WHERE rn = num_words
) src
ON (src.rid = dst.ROWID)
WHEN MATCHED THEN
UPDATE SET clear = src.clear;
Which for your sample data:
CREATE TABLE ExceptionWords (id, Word) AS
SELECT 1, 'Banana' FROM DUAL UNION ALL
SELECT 2, 'Apple' FROM DUAL UNION ALL
SELECT 3, 'Orange' FROM DUAL;
CREATE TABLE OriginalWords (id, Original, Clear) AS
SELECT 1, 'London, UK Banana', CAST(NULL AS VARCHAR2(30)) FROM DUAL UNION ALL
SELECT 2, 'Moscow, Russia Banana Apple', NULL FROM DUAL UNION ALL
SELECT 3, 'Vine Banana Apple Orange', NULL FROM DUAL UNION ALL
SELECT 4, 'London, UK Banana', NULL FROM DUAL;
Then, after either MERGE
statement:
SELECT * FROM OriginalWords;
Outputs:
ID ORIGINAL CLEAR 1 London, UK Banana London, UK 2 Moscow, Russia Banana Apple Moscow, Russia 3 Vine Banana Apple Orange Vine 4 London, UK Banana London, UK
db<>fiddle here