I am trying to group a category of items based on specific words in one column. Example : in column "item" I have : wool red sweater wool blue trousers leather black shoes leather brown pants
I need to categorize those items in two different categories, "wool items" and "leather items". I need to categorize several items at the same time. I think I would need regexp to extract that one specific word, but have no idea how to use it. The query I've managed to write also returns null values, even though I am using NOTNULL includes rows where the items are neither wool or leather.
SELECT
DISTINCT item,
CASE WHEN item LIKE '%wool%'
AND NULLIF(item,'NULL') IS NOT NULL
THEN
1
END AS woolitems,
CASE WHEN item LIKE '%leather%'
AND NULLIF(item,'NULL') IS NOT NULL
THEN
1
END AS leatheritems,
FROM table
I need help understanding how to create one big category and how to have only the rows where the word I'm looking for returns. Thanks for your help :)
CodePudding user response:
It sounds like you are looking for something like
SELECT
DISTINCT item,
CASE WHEN item LIKE '%wool%'
AND NULLIF(item,'NULL') IS NOT NULL THEN 'wool item'
WHEN item LIKE '%leather%'
AND NULLIF(item,'NULL') IS NOT NULL THEN 'leather item'
ELSE 'other item'
END AS item_category
FROM `table`
This returns a table with 2 columns:
- item,
- item_category that will be:
- 'wool item' if there is wool in the item name,
- 'leather item' if there is leather in the item name,
- 'other item' otherwise
If you want to filter out the 'other item's, you'll just have to wrap it into a CTE:
WITH temp AS (<previous query>)
SELECT * FROM temp
WHERE item_category != 'other item'
What the query you provided does is returning 3 columns:
- item,
- woolitems → 1 if there's 'wool' in item name,
NULL
otherwise - leatheritems → 1 if there's 'leather' in item name,
NULL
otherwise
Therefore for the other items you only have NULL and NULL.
CodePudding user response:
Consider below approach
select item,
regexp_extract(item, 'wool|leather') category
from your_table
if applied to sample data in your question
with your_table as (
select 'wool red sweater' item union all
select 'wool blue trousers' union all
select 'leather black shoes' union all
select 'leather brown pants'
)
output is
CodePudding user response:
if for some reason you want to use your original query as a starting point - just wrap it up with unpivot as in below example
select item, category from (
SELECT
DISTINCT item,
CASE
WHEN item LIKE '%wool%' AND NULLIF(item,'NULL') IS NOT NULL
THEN 1
END AS woolitems,
CASE
WHEN item LIKE '%leather%' AND NULLIF(item,'NULL') IS NOT NULL
THEN 1
END AS leatheritems,
FROM your_table
)
unpivot (value for category in (woolitems, leatheritems))
with output