Home > database >  Group a category of items based on one word in column
Group a category of items based on one word in column

Time:02-16

I am trying to group a category of items based on specific words in one column. Example : in column "item" I have : wool red sweater wool blue trousers leather black shoes leather brown pants

I need to categorize those items in two different categories, "wool items" and "leather items". I need to categorize several items at the same time. I think I would need regexp to extract that one specific word, but have no idea how to use it. The query I've managed to write also returns null values, even though I am using NOTNULL includes rows where the items are neither wool or leather.

    SELECT 
DISTINCT item,
       CASE WHEN item LIKE '%wool%' 
       AND NULLIF(item,'NULL') IS NOT NULL
       THEN
            1
       END AS woolitems,
       CASE WHEN item LIKE '%leather%' 
       AND NULLIF(item,'NULL') IS NOT NULL
       THEN
            1
       END AS leatheritems,
FROM   table

I need help understanding how to create one big category and how to have only the rows where the word I'm looking for returns. Thanks for your help :)

CodePudding user response:

It sounds like you are looking for something like

SELECT
    DISTINCT item,
    CASE WHEN item LIKE '%wool%' 
           AND NULLIF(item,'NULL') IS NOT NULL THEN 'wool item'
        WHEN item LIKE '%leather%' 
           AND NULLIF(item,'NULL') IS NOT NULL THEN 'leather item'
        ELSE 'other item'
    END AS item_category
FROM `table`

This returns a table with 2 columns:

  • item,
  • item_category that will be:
    • 'wool item' if there is wool in the item name,
    • 'leather item' if there is leather in the item name,
    • 'other item' otherwise

If you want to filter out the 'other item's, you'll just have to wrap it into a CTE:

WITH temp AS (<previous query>)

SELECT * FROM temp
WHERE item_category != 'other item'

What the query you provided does is returning 3 columns:

  • item,
  • woolitems → 1 if there's 'wool' in item name, NULL otherwise
  • leatheritems → 1 if there's 'leather' in item name, NULL otherwise

Therefore for the other items you only have NULL and NULL.

CodePudding user response:

Consider below approach

select item, 
  regexp_extract(item, 'wool|leather') category
from your_table          

if applied to sample data in your question

with your_table as (
  select 'wool red sweater' item union all
  select 'wool blue trousers' union all
  select 'leather black shoes' union all
  select 'leather brown pants' 
)            

output is

enter image description here

CodePudding user response:

if for some reason you want to use your original query as a starting point - just wrap it up with unpivot as in below example

select item, category from (
  SELECT 
    DISTINCT item,
    CASE 
      WHEN item LIKE '%wool%' AND NULLIF(item,'NULL') IS NOT NULL
      THEN 1
    END AS woolitems,
    CASE 
      WHEN item LIKE '%leather%' AND NULLIF(item,'NULL') IS NOT NULL
      THEN 1
    END AS leatheritems,
  FROM   your_table
)
unpivot (value for category in (woolitems, leatheritems))   

with output

enter image description here

  • Related