The table has 217K records.
I have a table which consist of bands column . It consists of sample data like 'GSM1800, GSM 900,LTE 100,LTE 200,WCDMA 100,CA 100, CA 200. My objective is to showcase GSM, LTE,WCDMA, CA in separate columns while selecting( basically using temp separate columns for each band type).
Initially I was doing a select using the below query ,
select tac,
bands,
string_agg( gsm, ', ' ORDER BY gsm) gsm,
string_agg( lte, ', ' ORDER BY lte) lte,
string_agg( ca, ', ' ORDER BY ca) ca,
string_agg( wcdma, ', ' ORDER BY wcdma) wcdma,
from (
select
tac,
bands,
array_to_string(regexp_matches(bands,'(GSM[\s \w ] )','g'), '') as gsm,
array_to_string(regexp_matches(bands,'(LTE[\s \w ] )','g'), '') as lte,
array_to_string(regexp_matches(bands,'(CA[\s \w \-] )','g'), '') as ca,
array_to_string(regexp_matches(bands,'(WCDMA[\s \w ] )','g'), '') as wcdma,
from
(select * from table_lookup )a )b
group by tac,bands
The above query was very slow. Later my lead suggested we should create the separate columns and update the table.
I tried to test it with 9 rows and tried to update GSM column.
Using the below query
UPDATE wireless_nta.tac_lookup
SET gsm = (select string_agg( gsm, ', ' ORDER BY gsm) gsm from
(select array_to_string(regexp_matches(bands,'(GSM[\s \w ] )','g'), '') as gsm from table_lookup )a)
FROM table_lookup AS m
inner JOIN table_lookup AS g on m.id = g.id
WHERE table_lookup.id = m.id and table_lookup.id <10;
this is causing weird update.
for example id =1 , bands column consist of GSM 1900,GSM850 (GSM800),WCDMA100,LTE200 but GSM is recorded as GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800, GSM 1800 instead of GSM 1900,GSM850 (GSM800).
CodePudding user response:
I think your INNER JOIN
clause is causing you problems. The FROM
clause in your UPDATE
query will already join the table once:
When a FROM clause is present, what essentially happens is that the target table is joined to the tables mentioned in the from_item list, and each output row of the join represents an update operation for the target table.
https://www.postgresql.org/docs/current/sql-update.html
Additionally, since you select the table again in the subquery, the FROM
clause in the UPDATE
query can be omitted completely.
Starting form this schema as a minimal example:
CREATE TABLE items (
"id" INTEGER,
"categories_original" TEXT
);
INSERT INTO items
("id", "categories_original")
VALUES
('1', 'A1,A2,A3,B1,C1'),
('2', 'C1'),
('3', 'A1,A1,B1,C1'),
('4', 'A1,A4,B1,C1'),
('5', 'A1,A3,B1,C1');
ALTER TABLE items
ADD COLUMN "categories_a" TEXT,
ADD COLUMN "categories_b" TEXT,
ADD COLUMN "categories_c" TEXT;
A single column can be updated as follows (for the fist 10 items):
-- o: outer item
-- i: inner item
-- l: category list
-- n: category name
UPDATE items o SET categories_a = (
SELECT string_agg(l.n, ', ' ORDER BY l.n)
FROM (
SELECT array_to_string(regexp_matches(i.categories_original,'(A[\s \w ] )', 'g'), '') n
FROM items i
WHERE i.id = o.id
) l
)
WHERE o.id < 10;