Home > Mobile >  SQL in R - Transform a column into rows
SQL in R - Transform a column into rows

Time:12-25

I have the following query:

sqldf("
      SELECT YEAR, SUM(SUB1) AS 'SUB1', SUM(SUB2) AS 'SUB2',
      SUM(SUB3) AS 'SUB3', SUM(SUB4) AS 'SUB4', SUM(SUB5) AS 'SUB5'
      FROM  table1
      GROUP BY YEAR
      "
)

I got this table:

enter image description here

Is it possible to change SUB1 TO SUB5 from columns to rows?

CodePudding user response:

Two methods:

  1. Pivot post-query:

    sqldf::sqldf("
          SELECT YEAR, SUM(SUB1) AS 'SUB1', SUM(SUB2) AS 'SUB2',
          SUM(SUB3) AS 'SUB3', SUM(SUB4) AS 'SUB4', SUM(SUB5) AS 'SUB5'
          FROM  table1
          GROUP BY YEAR
          "
    ) |>
      reshape2::melt("YEAR")
    #    YEAR variable value
    # 1  2019     SUB1 19638
    # 2  2020     SUB1  3223
    # 3  2021     SUB1  8184
    # 4  2022     SUB1 18017
    # 5  2019     SUB2 16854
    # 6  2020     SUB2  2731
    # 7  2021     SUB2  7034
    # 8  2022     SUB2 15487
    # 9  2019     SUB3  1087
    # 10 2020     SUB3  2278
    # 11 2021     SUB3  5922
    # 12 2022     SUB3 12989
    # 13 2019     SUB4  8598
    # 14 2020     SUB4  1385
    # 15 2021     SUB4  3629
    # 16 2022     SUB4  7798
    # 17 2019     SUB5   177
    # 18 2020     SUB5    45
    # 19 2021     SUB5    72
    # 20 2022     SUB5   181
    

    This can also be done with data.table::melt (same syntax) or tidyr::pivot_longer just as well.

  2. In SQL, a bit more work (and less-flexible):

    sqldf::sqldf("
    select year, 'SUB1' as variable, sum(SUB1) as value from table1 group by year
    union all
    select year, 'SUB2' as variable, sum(SUB2) as value from table1 group by year
    union all
    select year, 'SUB3' as variable, sum(SUB3) as value from table1 group by year
    union all
    select year, 'SUB4' as variable, sum(SUB4) as value from table1 group by year
    union all
    select year, 'SUB5' as variable, sum(SUB5) as value from table1 group by year
    ")
    

Data

table1 <- structure(list(YEAR = 2019:2022, SUB1 = c(19638L, 3223L, 8184L, 18017L), SUB2 = c(16854L, 2731L, 7034L, 15487L), SUB3 = c(1087L, 2278L, 5922L, 12989L), SUB4 = c(8598L, 1385L, 3629L, 7798L), SUB5 = c(177L, 45L, 72L, 181L)), class = "data.frame", row.names = c(NA, -4L))

(It just-so-happens that this sample data is the same as the output in the question's image ... the sum will be the same, I did not try to mimic an un-aggregated frame.)

  • Related