Home > Blockchain >  How to transpose values in rows to columns in MySQL
How to transpose values in rows to columns in MySQL

Time:05-10

This image shows how my raw table looks like:

enter image description here

Following are the conditions to get the transposed table from the image below:

  1. Each row has a unique id
  2. We only need columns for groups A,B,C in the group field and not others.
  3. There could be single or multiple id for group A for the same app id, I need to get those rows for which date is minimum.
  4. There could be single or multiple id for group B and C for the same app id, I need to get those rows for which date is maximum

The image below shows how my final table should look like:

enter image description here

CodePudding user response:

  1. Each row has a unique id

  2. We only need columns for groups A,B,C in the group field and not others.

add this to your query

WHERE `GROUP` IN ('A','B','C')
  1. There could be single or multiple id for group A for the same app id, I need to get those rows for which date is minimum.

add somewhere after the SELECT:

   MIN(date) OVER (PARTIITON BY appid)
  1. There could be single or multiple id for group B and C for the same app id, I need to get those rows for which date is maximum

change the added option on point 3 to:

CASE WHEN `group` IN ('B','C')
     THEN MAX(date) OVER (PARTIITON BY appid)
     ELSE MIN(date) OVER (PARTIITON BY appid)
     END

Maybe this helps you to try and take a serious request of solving this yourself (and learn from it) in stead of asking for a solution and then do copy/paste...

BTW: Naming fiels with reserved words, like GROUP and DATE is not a very smart thing to do. A better name for the column GROUP might be CategoryGroup (or whatever this group is referring to)

CodePudding user response:

I took a different approach to this. The SQL is longer but I think it's more auditable.

The main logic point is that I broke A and BC into 2 different subqueries, and used QUALIFY ROW_NUMBER() to choose the correct row, based on either ASC or DESC per your requirements.

I know you are using mysql and this might not work since I don't have an instance to test this one, but here is the SQL I got from building this logic in Rasgo, which I tested on Snowflake and it worked.

-- This splits the data into group A only
WITH CTE_A AS (
  SELECT 
    * 
  FROM 
    {{ your_table }}
  WHERE 
    my_group = 'A'
), 
-- This splits the data into group B and C only
CTE_B AS (
  SELECT 
    * 
  FROM 
    {{ your_table }}
  WHERE 
    my_group IN('B', 'C')
), 
-- Selecting from A only, it keeps the most recent row ASCENDING
CTE_A_FIRST AS (
  SELECT 
    * 
  FROM 
    CTE_A QUALIFY ROW_NUMBER() OVER (
      PARTITION BY APP_ID, 
      MY_GROUP 
      ORDER BY 
        MY_DATE ASC
    ) = 1
), 
-- Selecting from A only, it keeps the most recent row DESCENDING
CTE_B_LAST AS (
  SELECT 
    * 
  FROM 
    CTE_B QUALIFY ROW_NUMBER() OVER (
      PARTITION BY APP_ID, 
      MY_GROUP 
      ORDER BY 
        MY_DATE DESC
    ) = 1
), 
-- Here we just union A and BC back to one another
CTE_ABC AS (
  SELECT 
    ID, 
    APP_ID, 
    MY_DATE, 
    MY_GROUP, 
    SCORE1, 
    SCORE2 
  FROM 
    CTE_B_LAST 
  UNION ALL 
  SELECT 
    ID, 
    APP_ID, 
    MY_DATE, 
    MY_GROUP, 
    SCORE1, 
    SCORE2 
  FROM 
    CTE_B
), 
-- We pivot the date horizontally so we get a date for A B C
-- the MIN does not matter, since at this point, we only have 1
CTE_PVT_DATE AS (
  SELECT 
    APP_ID, 
    B, 
    C, 
    A 
  FROM 
    (
      SELECT 
        APP_ID, 
        MY_DATE, 
        MY_GROUP 
      FROM 
        CTE_ABC
    ) PIVOT (
      MIN (MY_DATE) FOR MY_GROUP IN ('B', 'C', 'A')
    ) as p (APP_ID, B, C, A)
), 
-- We pivot the SCORE1 horizontally so we get a date for A B C
-- the MIN does not matter, since at this point, we only have 1
CTE_PVT_SCORE1 AS (
  SELECT 
    APP_ID, 
    B, 
    C, 
    A 
  FROM 
    (
      SELECT 
        APP_ID, 
        SCORE1, 
        MY_GROUP 
      FROM 
        CTE_ABC
    ) PIVOT (
      MIN (SCORE1) FOR MY_GROUP IN ('B', 'C', 'A')
    ) as p (APP_ID, B, C, A)
), 
-- We pivot the SCORE2 horizontally so we get a date for A B C
-- the MIN does not matter, since at this point, we only have 1
CTE_PVT_SCORE2 AS (
  SELECT 
    APP_ID, 
    B, 
    C, 
    A 
  FROM 
    (
      SELECT 
        APP_ID, 
        SCORE2, 
        MY_GROUP 
      FROM 
        CTE_ABC
    ) PIVOT (
      MIN (SCORE2) FOR MY_GROUP IN ('B', 'C', 'A')
    ) as p (APP_ID, B, C, A)
), 
-- We join the subqueries above together on the APP_IDs
CTE_JOINED AS (
  SELECT 
    t0.*, 
    t1.APP_ID as SCORE1_APP_ID, 
    t1.B as SCORE1_B, 
    t1.C as SCORE1_C, 
    t1.A as SCORE1_A, 
    t2.APP_ID as SCORE2_APP_ID, 
    t2.B as SCORE2_B, 
    t2.C as SCORE2_C, 
    t2.A as SCORE2_A 
  FROM 
    CTE_PVT_DATE t0 
    INNER JOIN CTE_PVT_SCORE1 t1 ON t0.APP_ID = t1.APP_ID 
    INNER JOIN CTE_PVT_SCORE2 t2 ON t0.APP_ID = t2.APP_ID
) 
-- The final select is really just renaming ... 
-- the magic has already happened
SELECT 
  A AS DATE_A, 
  B AS DATE_B, 
  C AS DATE_C, 
  APP_ID, 
  SCORE1_B, 
  SCORE1_C, 
  SCORE1_A, 
  SCORE2_B, 
  SCORE2_C, 
  SCORE2_A 
FROM 
  CTE_JOINED

CodePudding user response:

I'll roll out my attempt along several steps and then show you the full solution made up of these steps, so that you can understand it piece by piece, given the following definition of your input table:

CREATE TABLE tab(
    id      INT,
    app_id  INT,
    date    VARCHAR(20),
    group   VARCHAR(20),
    score1  INT,
    score2  INT
);

STEP 1. Formatting date using a proper DATE format ("YYYY-MM-DD"). For this purpose the function STR_TO_DATE can come in handy.

WITH formatted_tab AS (
    SELECT id,
           app_id,
           STR_TO_DATE(date, '%m/%d/%Y') AS date,
           group,
           score1,
           score2
    FROM   tab
)

STEP 2. Extracting the useful dates according to the group field. As long as you treat group "A" differently with respect to group "B" and "C" specifically, the idea here is to address each group with a different query, where

  • in the former case the MIN aggregation function is applied,
  • in the latter case the MAX aggregation function is applied,

Then the two output result sets are combined with a UNION operation.

(
SELECT   app_id,
         MIN(date)  AS date,
         group      
FROM     formatted_tab
WHERE    group IN ('A')
GROUP BY app_id, 
         group 

UNION

SELECT   app_id,
         MAX(date)  AS date,
         group      
FROM     formatted_tab
WHERE    group IN ('B', 'C')
GROUP BY app_id, 
         group
) needed_dates

STEP 3. Getting back scores corresponding to group and date field. This is done with a simple INNER JOIN between the last generated table and the formatted table.

(
SELECT needed_dates.*,
       formatted_tab.score1,
       formatted_tab.score2
FROM       needed_dates
INNER JOIN formatted_tab
        ON needed_dates.app_id = formatted_tab.app_id
       AND needed_dates.date   = formatted_tab.date
       AND needed_dates.group  = formatted_tab.group
) needed_infos

STEP 4. Pivoting the table exploiting MySQL tools like:

  • the IF statement to retrieve the values corresponding to a specific group
  • the MAX aggregation function, to aggregate on the same group

These tools are applied for each group you specified ('A', 'B' and 'C').

SELECT app_id,
       MAX(IF(group='A', date  , NULL)) AS date_groupA,
       MAX(IF(group='B', date  , NULL)) AS date_groupB,
       MAX(IF(group='C', date  , NULL)) AS date_groupC,
       MAX(IF(group='A', score1, NULL)) AS score1_groupA,
       MAX(IF(group='A', score2, NULL)) AS score2_groupA,
       MAX(IF(group='B', score1, NULL)) AS score1_groupB,
       MAX(IF(group='B', score2, NULL)) AS score2_groupB,
       MAX(IF(group='C', score1, NULL)) AS score1_groupC,
       MAX(IF(group='C', score2, NULL)) AS score2_groupC
FROM     needed_infos
GROUP BY app_id

Full attempt. This is the combination of the previous snippets. The only difference is the presence of backticks for the field names, that avoid MySQL to misunderstand them with MySQL private keywords like "date" (indicating the DATE type), "group" (use as keyword in the GROUP BY clause) or similar.

WITH `formatted_tab` AS (
    SELECT `id`,
           `app_id`,
           STR_TO_DATE(`date`, '%m/%d/%Y') AS `date`,
           `group`,
           `score1`,
           `score2`
    FROM   `tab`
)
SELECT `app_id`,
       MAX(IF(`group`='A', `date`  , NULL)) AS date_groupA,
       MAX(IF(`group`='B', `date`  , NULL)) AS date_groupB,
       MAX(IF(`group`='C', `date`  , NULL)) AS date_groupC,
       MAX(IF(`group`='A', `score1`, NULL)) AS score1_groupA,
       MAX(IF(`group`='A', `score2`, NULL)) AS score2_groupA,
       MAX(IF(`group`='B', `score1`, NULL)) AS score1_groupB,
       MAX(IF(`group`='B', `score2`, NULL)) AS score2_groupB,
       MAX(IF(`group`='C', `score1`, NULL)) AS score1_groupC,
       MAX(IF(`group`='C', `score2`, NULL)) AS score2_groupC
FROM ( SELECT needed_dates.*,
              formatted_tab.score1,
              formatted_tab.score2
       FROM (   SELECT   `app_id`,
                         MIN(`date`)    AS `date`,
                         `group`        
                FROM     `formatted_tab`
                WHERE    `group` IN ('A')
                GROUP BY `app_id`, 
                         `group` 
                UNION
                SELECT   `app_id`,
                         MAX(`date`)    AS `date`,
                         `group`        
                FROM     `formatted_tab`
                WHERE    `group` IN ('B', 'C')
                GROUP BY `app_id`, 
                         `group`
                ) needed_dates
       INNER JOIN formatted_tab
               ON needed_dates.app_id = formatted_tab.app_id
              AND needed_dates.date   = formatted_tab.date
              AND needed_dates.group  = formatted_tab.group
       ) needed_infos
GROUP BY `app_id`

You'll find a tested SQL Fiddle here.

  • Related