I have the column that is a string of values (with no particular order) separated by the comma:
event_list
2,100,101,102,103,104,105,106,110,114,121,126,152,185,191,524,150,198,158,111,20
100,101,102,103,104,110,114,121,126,152,175,185,191,150,198,158,111,123,10091
Of the values I am only interested in 1,2,10,11,12,13,14 and 20. The rest are irrelevant. For example 2 - "Product view" and 12 - "Add to cart".
So I am trying to do something like
CASE WHEN 2 IN event_list THEN 1 ELSE 0 END as product_view flag,
CASE WHEN 12 IN event_list THEN 1 ELSE 0 END as add_to_cart_flag
...
But since it's SQL and not Python, I don't think above is possible, hence trying to figure out how to do it. And I don't think using regex will be helpful, since '120' would contain '2' as well.
STRING_SPLIT
is not an optimal solution because the data is already 900Bil rows.
CodePudding user response:
A couple of methods would be
SELECT *
FROM YourTable yt
CROSS APPLY
(
SELECT
MAX(CASE WHEN value = '2' THEN 1 ELSE 0 END) as product_view_flag,
MAX(CASE WHEN value = '12' THEN 1 ELSE 0 END) as add_to_cart_flag
FROM STRING_SPLIT(yt.event_list, ',')
) ca
or
SELECT yt.*,
CASE WHEN adj_event_list LIKE '%,2,%' THEN 1 ELSE 0 END as product_view_flag,
CASE WHEN adj_event_list LIKE '%,12,%' THEN 1 ELSE 0 END as add_to_cart_flag
FROM YourTable yt
CROSS APPLY (SELECT CONCAT(',',yt.event_list,',')) CA(adj_event_list)
If you are actually running this on 900Bil rows both will be slow. I can't guess which will "win" - you would need to test both.