I have a dataframe of User IDs and Tags as shown below under 'Current Data' .
The Goal:
I want to be able to duplicate records per each value under the tags column. As you can see in the target output, user ID 21 is repeated 3x for each of the three tags that are in the source 'TAGS' - everything is duplicated except the Tag column - 1 Record per item in the comma separated list.
Issue:
I looked at using the SPLIT_TO_TABLE functionality in snowflake but it doesn't work in my use case as not all the tags are consistently in some kind of order and in some cases, the cell is also blank.
Current Data:
USER_ID CITY STATUS PPL TAGS
21 LA checked 6 bad ui/ux,dashboards/reporting,pricing
32 SD checked 9 buggy,laggy
21 ATL checked 9
234 MIA checked 5 glitchy, bad ui/ux, horrible
The target:
USER_ID CITY STATUS PPL TAGS
21 LA checked 6 bad ui/ux
21 LA checked 6 dashboards/reporting
21 LA checked 6 Pricing
32 SD checked 9 buggy
32 SD checked 9 laggy
21 ATL checked 9
234 MIA checked 5 glitchy
234 MIA checked 5 bad ui/ux
234 MIA checked 5 horrible
Sql:
select table1.value
from table(split_to_table('a.b', '.')) as table1
CodePudding user response:
SPLIT_TO_TABLE works. Below is the query using your sample data:
select USER_ID, CITY, STATUS, PPL, VALUE
from (values
(21,'LA','checked',6,'bad ui/ux,dashboards/reporting,pricing')
,(32,'SD','checked',9,'buggy,laggy')
,(21,'ATL','checked',9,'')
,(234,'MIA','checked',5,'glitchy, bad ui/ux, horrible')
) as tbl (USER_ID,CITY,STATUS,PPL,TAGS)
, lateral split_to_table(tbl.tags,',');
Result:
USER_ID CITY STATUS PPL VALUE
21 LA checked 6 bad ui/ux
21 LA checked 6 dashboards/reporting
21 LA checked 6 pricing
32 SD checked 9 buggy
32 SD checked 9 laggy
21 ATL checked 9
234 MIA checked 5 glitchy
234 MIA checked 5 bad ui/ux
234 MIA checked 5 horrible