Let's say I have the following Minimal, Reproducible, Example (I am using Hive):
CREATE TABLE have (id INT, sub_id INT, name STRING, sub_name STRING, value STRING, A STRING, B STRING, C STRING);
INSERT INTO have VALUES
(1,111111,'ITEM1','Score','AAA','concat1_1','concat2_1','concat3_1'),
(1,111111,'ITEM1','Score','AAA','concat1_2','concat2_2','concat3_2'),
(1,111111,'ITEM1','Weight','2 ','conc1_1','conc2_1','conc3_1'),
(1,111111,'ITEM1','Weight','2 ','conc1_2','conc2_2','conc3_2'),
(1,222222,'ITEM1','Score','BBB','c1_1','c2_1','c3_1'),
(1,222222,'ITEM1','Score','BBB','c1_2','c2_2','c3_2')
;
I want to concatenate a
, b
and c
accross rows with the same sub_id
and sub_name
Using the concat_ws
and a group by
I can concatenate accross row:
create table want as
select sub_id, sub_name,
concat_ws("|", collect_set(a)) as a,
concat_ws("|", collect_set(b)) as b,
concat_ws("|", collect_set(c)) as c
from have
group by sub_id, sub_name;
But how can I retrieve also the other columns?
When I try
create table want as
select id, sub_id, name, sub_name, value,
concat_ws("|", collect_set(a)) as a,
concat_ws("|", collect_set(b)) as b,
concat_ws("|", collect_set(c)) as c
from have
group by sub_id, sub_name;
I get the following error:
Error while compiling statement: FAILED: SemanticException [Error 10025]: line 17:7 Expression not in GROUP BY key 'id'
My desired output would be:
---- -------- ------- --------- ------- --------------------- --------------------- --------------------- -- --
| id | sub_id | name | subname | value | a | b | c | | |
---- -------- ------- --------- ------- --------------------- --------------------- --------------------- -- --
| 1 | 111111 | ITEM1 | Score | AAA | concat1_1|concat1_2 | concat2_1|concat2_2 | concat3_1|concat3_2 | | |
| 1 | 111111 | ITEM1 | Weight | 2 | conc1_1|conc1_2 | conc2_1|conc2_2 | conc3_1|conc3_2 | | |
| 1 | 222222 | ITEM1 | Score | BBB | c1_1|c1_2 | c2_1|c2_2 | c3_1|c3_2 | | |
---- -------- ------- --------- ------- --------------------- --------------------- --------------------- -- --
CodePudding user response:
Because using aggregate function need to add non-group by column in group by
select id, sub_id, name, sub_name, value,
concat_ws("|", collect_set(a)) as a,
concat_ws("|", collect_set(b)) as b,
concat_ws("|", collect_set(c)) as c
from have
group by id, sub_id, name, sub_name, value;