Home > front end >  How to concat rows in a groupby but keep all variables in Hive
How to concat rows in a groupby but keep all variables in Hive

Time:03-11

Let's say I have the following Minimal, Reproducible, Example (I am using Hive):

CREATE TABLE have (id INT, sub_id INT, name STRING, sub_name STRING, value STRING, A STRING, B STRING, C STRING);
INSERT INTO have VALUES
    (1,111111,'ITEM1','Score','AAA','concat1_1','concat2_1','concat3_1'),
    (1,111111,'ITEM1','Score','AAA','concat1_2','concat2_2','concat3_2'),
    (1,111111,'ITEM1','Weight','2 ','conc1_1','conc2_1','conc3_1'),
    (1,111111,'ITEM1','Weight','2 ','conc1_2','conc2_2','conc3_2'),
    (1,222222,'ITEM1','Score','BBB','c1_1','c2_1','c3_1'),
    (1,222222,'ITEM1','Score','BBB','c1_2','c2_2','c3_2')
    ;

I want to concatenate a, b and c accross rows with the same sub_id and sub_name

Using the concat_ws and a group by I can concatenate accross row:

create table want as
select sub_id, sub_name, 
   concat_ws("|", collect_set(a)) as a,
   concat_ws("|", collect_set(b)) as b,
   concat_ws("|", collect_set(c)) as c
from have
group by sub_id, sub_name;

But how can I retrieve also the other columns?

When I try

create table want as
select id, sub_id, name, sub_name, value, 
   concat_ws("|", collect_set(a)) as a, 
   concat_ws("|", collect_set(b)) as b, 
   concat_ws("|", collect_set(c)) as c
from have
group by sub_id, sub_name;

I get the following error:

Error while compiling statement: FAILED: SemanticException [Error 10025]: line 17:7 Expression not in GROUP BY key 'id'

My desired output would be:

 ---- -------- ------- --------- ------- --------------------- --------------------- --------------------- -- -- 
| id | sub_id | name  | subname | value |          a          |          b          |          c          |  |  |
 ---- -------- ------- --------- ------- --------------------- --------------------- --------------------- -- -- 
|  1 | 111111 | ITEM1 | Score   | AAA   | concat1_1|concat1_2 | concat2_1|concat2_2 | concat3_1|concat3_2 |  |  |
|  1 | 111111 | ITEM1 | Weight  | 2     | conc1_1|conc1_2     | conc2_1|conc2_2     |   conc3_1|conc3_2   |  |  |
|  1 | 222222 | ITEM1 | Score   | BBB   | c1_1|c1_2           |   c2_1|c2_2         |      c3_1|c3_2      |  |  |
 ---- -------- ------- --------- ------- --------------------- --------------------- --------------------- -- -- 

CodePudding user response:

Because using aggregate function need to add non-group by column in group by

select id, sub_id, name, sub_name, value, 
   concat_ws("|", collect_set(a)) as a, 
   concat_ws("|", collect_set(b)) as b, 
   concat_ws("|", collect_set(c)) as c
from have
group by id, sub_id, name, sub_name, value;
  • Related