I'm having problems with the following code and I was wondering if anyone could help me resolve this issue. I have two tables, tbl
and stopwords
which you can recreate with the following query :
CREATE TABLE stopwords(
stopword char(100) NOT NULL
);
insert into stopwords values ('I');
insert into stopwords values ('in');
insert into stopwords values ('on');
insert into stopwords values ('of');
insert into stopwords values ('to');
insert into stopwords values ('a');
insert into stopwords values ('too');
CREATE TABLE tbl (
id INT,
message TEXT
);
INSERT INTO tbl VALUES
(1, 'in in in in of of of of on too we are on top of the world'),
(2, 'too bad the apple is rotten'),
(3, 'there is no I in team but a man thinks'),
(4, 'it matters where you go to a random');
I want to update the message column in tbh while removing each stopword from the stopwords table. However, I want to do this process iteratively, instead of manually writing out an update statement for each stopword.
I tried the following query and no errors occur but the message column does not update to anything and does not return any errors.
update tbl
SET message = TRIM(
REPLACE(CONCAT(' ', REPLACE(message, sw.stopword, sw.stopword), ' '), sw.stopword, '')
)
from stopwords as sw
where CONCAT(' ', message , ' ') like CONCAT('%' ,sw.stopword , '%');
Can anyone help me with debugging this?
CodePudding user response:
You need to define a function. We start by adding a space before and after the line. We then loop through the words from stopwords with replace with space. We keep running each replace until the length after the remove is the same as before it. Finally we use TRIM
to remove the spaces before and after the string.
CREATE TABLE stopwords( stopword char(100) NOT NULL ); insert into stopwords values ('I'); insert into stopwords values ('in'); insert into stopwords values ('on'); insert into stopwords values ('of'); insert into stopwords values ('to'); insert into stopwords values ('a'); insert into stopwords values ('too'); CREATE TABLE tbl ( id INT, message TEXT ); INSERT INTO tbl VALUES (1, 'in in in in of of of of on too we are on top of the world'), (2, 'too bad the apple is rotten'), (3, 'there is no I in team but a man thinks'), (4, 'it matters where you go to a random');
✓
1 rows affected
1 rows affected
1 rows affected
1 rows affected
1 rows affected
1 rows affected
1 rows affected
✓
4 rows affected
create function removeStop(raw_string varchar(100)) returns varchar(100) language plpgsql as $$ declare raw_str varchar(100); pat varchar(10); l1 int; l2 int; tRow record; -- variable declaration begin raw_str = concat(' ',raw_string,' '); for tRow in select stopword s from stopwords loop pat = concat(' ',trim(from tRow.s),' '); l1=1000;l2=length(raw_str); while l1>l2 loop l1=l2; raw_str = replace(raw_str,pat,' '); l2 = length(raw_str); end loop; end loop; return trim(from raw_str); --return pat; -- logic end; $$
✓
select * from tbl;
id | message -: | :-------------------------------------------------------- 1 | in in in in of of of of on too we are on top of the world 2 | too bad the apple is rotten 3 | there is no I in team but a man thinks 4 | it matters where you go to a random
update tbl set message=removeStop(message);
4 rows affected
select * from tbl;
id | message -: | :------------------------------ 1 | we are top the world 2 | bad the apple is rotten 3 | there is no team but man thinks 4 | it matters where you go random
db<>fiddle here