Home > Blockchain >  Is There a Way to Automate the Conversion of SQL Rows to Column Using Case?
Is There a Way to Automate the Conversion of SQL Rows to Column Using Case?

Time:05-24

I was playing with usa_names dataset on Bigquery and in order to be able to visualize the top 10 names between 1910 and 2020, I had to GROUP BY year and create a new column for each of the 10 names using CASE.

The thing is, I will like to visualize the top 100 and I want to know if there is a way to automate the CASE, in the sense that I don't have to write a "WHEN and THEN Clause for each name in order to create a column for them.

I had to use the following SQL query code to first get the top 10 names;

SELECT
  name,
  SUM(number) AS total
FROM
  bigquery-public-data.usa_names.usa_1910_current
WHERE
  year BETWEEN 1910 AND 2020
GROUP BY
  name
ORDER BY
  total DESC
LIMIT
  10

And then use the following code to convert each name row to columns;

SELECT
  year,
  SUM(CASE WHEN name = 'James' THEN number ELSE 0 END) AS James,
  SUM(CASE WHEN name = 'John' THEN number ELSE 0 END) AS John,
  SUM(CASE WHEN name = 'Robert' THEN number ELSE 0 END) AS Robert,
  SUM(CASE WHEN name = 'Michael' THEN number ELSE 0 END) AS Michael,
  SUM(CASE WHEN name = 'William' THEN number ELSE 0 END) AS William,
  SUM(CASE WHEN name = 'Mary' THEN number ELSE 0 END) AS Mary,
  SUM(CASE WHEN name = 'Richard' THEN number ELSE 0 END) AS Richard,
  SUM(CASE WHEN name = 'Joseph' THEN number ELSE 0 END) AS Joseph,
  SUM(CASE WHEN name = 'Charles' THEN number ELSE 0 END) AS Charles,
  SUM(CASE WHEN name = 'Thomas' THEN number ELSE 0 END) AS Thomas
FROM
  bigquery-public-data.usa_names.usa_1910_current
GROUP BY
  year
ORDER BY
  year

I want to achieve the same result without having to first pull out the name and manually enter them into the CASE statements.

Also, this won't be needed if there is a way to visualize the data directly without having to convert the names from row to columns.

Thanks.

CodePudding user response:

You shouldn't need to create a column for each name. Your first query is sufficient (would obviously just need to change the limit to 100). Based on the questions tags I'm assuming your using Tableau, so it would be as simple as choosing your desired visualisation (say a bar chart) and placing names on one axis and total on the other axis.

Based on your follow up comment it would look like this

SELECT
name,
year,
SUM(number) AS total
From bigquery-public-data.usa_names.usa_1910_current
WHERE name IN
(
SELECT name
FROM
(
SELECT
  name,
  SUM(number) AS total
FROM
  bigquery-public-data.usa_names.usa_1910_current
WHERE
  year BETWEEN 1910 AND 2020
GROUP BY
  name
ORDER BY
  total DESC
LIMIT
  100
))
GROUP BY name, year

You could also look into using calculate fields within Tableau ok the raw data to achieve the desired visualisation.

CodePudding user response:

You need to combine 2 capabilities:

  1. row to column: PIVOT clause
  2. scripting to automate the query finding the top 10 names
declare top_names default ((
select concat("'", string_agg(name, "','"), "'") 
from (
// your query in question
SELECT
  name
FROM
  bigquery-public-data.usa_names.usa_1910_current
WHERE
  year BETWEEN 1910 AND 2020
GROUP BY
  name
ORDER BY
  SUM(number) DESC
LIMIT
  10
)));
select top_names;

The output is:

'James','John','Robert','Michael','William','Mary','David','Richard','Joseph','Charles'

The PIVOT query you will need is:

SELECT * FROM
  (select year, name, sum(number) number
   from bigquery-public-data.usa_names.usa_1910_current
   group by year, name
   )
  PIVOT(SUM(number) FOR name IN ('James','John','Robert','Michael','William','Mary','David','Richard','Joseph','Charles'
))

which output exactly as your second query.

To stick the 2 together, you will need something like:

execute immediate concat("<pivot query before names>", top_names, "<pivot query after names>");
  • Related