So I have data as follows:
ID tags
001 apple, banana
001 NA
002 berry, blue, banana
003 melon, apple, grape
002 grape
001 apple, banana
001 grape
All I would like to do is make a new table which gathers all text into one cell as follows:
ID tag_full
001 apple, banana, apple, banana, grape
002 berry, blue, banana, grape
003 melon, apple, grape
So all values of tags are in one cell for each ID. Order does not matter, but they must be separated by commas.
CodePudding user response:
how about this :
select id , listagg(tags , ',') as tag_full
from tablename
where tags <> 'NA'
group by id
CodePudding user response:
so the same answer as eshirvana:
SELECT
id,
listagg(tags, ',') as full_tags
FROM VALUES
(001, 'apple, banana'),
(001, 'NA'),
(002, 'berry, blue, banana'),
(003, 'melon, apple, grape'),
(002, 'grape'),
(001, 'apple, banana'),
(001, 'grape')
v(id, tags)
WHERE tags != 'NA'
GROUP BY 1
ORDER BY 1;
but it seems rather gross to aggregate in SQL and the count the data in python, when it can be done directly in SQL:
SELECT
id,
array_agg(object_construct(tag, tag_count)) WITHIN GROUP (ORDER BY tag_count desc) as full_tags
FROM (
SELECT
id
,trim(t.value) as tag
,count(*) as tag_count
FROM VALUES
(001, 'apple, banana'),
(001, 'NA'),
(002, 'berry, blue, banana'),
(003, 'melon, apple, grape'),
(002, 'grape'),
(001, 'apple, banana'),
(001, 'grape')
v(id, tags)
,table(split_to_table(tags, ',')) as t
WHERE tags != 'NA'
GROUP BY 1,2
)
GROUP BY 1
ORDER BY 1;
ID | FULL_TAGS |
---|---|
1 | [ { "apple": 2 }, { "banana": 2 }, { "grape": 1 } ] |
2 | [ { "berry": 1 }, { "blue": 1 }, { "banana": 1 }, { "grape": 1 } ] |
3 | [ { "melon": 1 }, { "grape": 1 }, { "apple": 1 } ] |