Home > Blockchain >  Update unique values/combinations of concatenated columns from another table
Update unique values/combinations of concatenated columns from another table

Time:01-17

I have table2 contains around 26 columns & 2mln rows

a b c d e f g h i j k l m n o p q r s t u v w x y z
18 2 2 22 0 2 1 2 1 3 1 2 1 3 26 2 0 22 0 22 2 32 2 4 2 2
20 2 2 2 0 0 0 2 1 4 0 2 1 4 24 0 0 2 0 2 1 3 2 5 0 0
10 2 2 222 0 2 1 2 1 2 1 2 1 2 24 0 2 2 0 2 1 3 1 5 0 2
12 2 2 3 0 0 0 0 1 3 0 0 1 3 21 2 0 0 0 0 0 22 1 4 2 0
15 2 2 3 0 0 0 0 1 3 0 0 1 3 21 2 0 2 0 2 1 22 1 4 2 0
20 2 2 2 0 0 0 0 1 4 0 0 1 4 20 2 0 2 0 0 0 22 2 4 2 0
15 2 2 22 0 0 0 0 1 2 0 0 1 2 21 2 0 2 0 0 0 22 2 4 2 0
18 2 2 22 0 0 0 0 1 3 0 0 1 3 21 2 0 2 0 0 0 22 1 4 2 0
8 2 0 22 0 2 1 0 1 3 1 0 1 3 24 0 0 2 0 0 0 3 2 5 0 2
14 2 2 3 0 2 1 0 1 3 1 0 1 3 12 0 2 22 0 2 1 22 2 3 0 2
14 2 0 222 0 22 2 2 2 2 2 2 2 2 20 0 0 0 0 0 0 3 3 4 0 22

And table1.joins around 918,555 rows which is concatenated headers of 26 columns (a:z) from table2

id sets joins result
1 2 a,b 0
2 4 a,c,f,i 0
3 5 a,d,o,x,y 0
4 2 a,e 0
5 2 a,f 0
6 3 a,g,x 0
7 2 a,h 0
8 5 a,i,l,r,s 0
9 2 a,j 0
10 2 a,k 0

The concatenated 2-7 sets of columns in table1 are based on the below criteria's;

2 sets of concatenated data from 26 columns = 325 rows/combinations (like “a,b”, “a,c”, “a,d”…...)
3 sets of concatenated data from 26 columns = 2,600 rows/combinations (like “a,b,c”, “a,b,d”, “a,b,e”……)
4sets of concatenated data from 26 columns  = 14,950 rows/combinations (like “a,b,c,d”, “a,b,c,e”, “a,b,c,f”……)
5sets of concatenated data from 26 columns = 65,780 rows/combinations (like “a,b,c,d,e”, “a,b,c,d,f”, “a,b,c,d,g”……)
6sets of concatenated data from 26 columns = 177,100 rows/combinations (like “a,b,c,d,e,f”, “a,b,c,d,e,g”, “a,b,c,d,e,h”……)
7sets of concatenated data from 26 columns = 657,800 rows/combinations (like “a,b,c,d,e,f,g”, “a,b,c,d,e,f,h”, “a,b,c,d,e,f,i”……)

I want each of those concatenated columns’ result from table2 to be updated in table1 after counting a unique values/combinations only.

Tried this basic code, but getting an error.

update table1 set result = (SELECT COUNT (*) 
FROM (SELECT DISTINCT (select concat(joins) from table1) FROM table2) as dists);

The expected unique result looks like this;

   joins   | result
----------- --------
 a,b       |      7
 a,c,f,i   |      9
 a,d,o,x,y |     11
 a,e       |      7
 a,f       |      9
 a,g,x     |     10
 a,h       |     10
 a,i,l,r,s |     10
 a,j       |      9
 a,k       |      9

CodePudding user response:

As mentioned in the comment: you can use plpgsql dynamic commands to construct your queries:

do $$ 
declare 
  rec record;
  current_joins text;
  current_result int;
begin
  for rec in select joins from table1 loop
     select rec.joins into current_joins;
     execute format('select count(*) 
                     from (
                        select distinct %1$s
                        from table2 
                     ) as some_alias;',
                     current_joins) 
              into current_result;
     update table1 set result=current_result where joins=current_joins;
  end loop;
end $$;

Which simply loops over your "joins" and executes your query once for each of them, updating the corresponding "result" field:

select * from table1;
   joins   | result
----------- --------
 a,b       |      7
 a,c,f,i   |      9
 a,d,o,x,y |     11
 a,e       |      7
 a,f       |      9
 a,g,x     |     10
 a,h       |     10
 a,i,l,r,s |     10
 a,j       |      9
 a,k       |      9

Online demo. Above is to show the principle - I don't expect you actually want one random "count" as your result. It'd help if you explained your logic a bit more, maybe adding an clarified example result.

CodePudding user response:

The first point that should be made is this is not proper relational structure.

Assuming you have no control over this, one thought is to do an unnest of each table then join them together:

select id, joins,
  unnest(string_to_array(joins, ',')) as col_name
from table1

select row_number() over(order by 1) as rowNr,
unnest(array['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']) AS col_name,
unnest(array[a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z]) AS col_value
from table2

Then you could join them together and string_agg column name and column value in table2 grouping by table1 id and RowNr to help you get your unique results. Something like this.

select t1.id, t1.joins,
  string_agg(t2.col_name || '_' ||  
    cast(t2.col_value as varchar), ',' order by t2.col_name ) as row_vals

It would then be a matter of counting distinct row values per id.

I have no way of knowing how well this would perform and I did not include every column name. You would have to validate column names/test/check syntax compatibility with your version.

  • Related