Looking for an efficient way of making a new data frame of totals across categories in R-CodePudding

Total R beginner here, looking for the quickest / most sensible way to do this:

I have a data frame that looks similar to this (but much longer):

dataframe:

date	a	b	c
1/1/2021	4	3	2
1/2/2021	2	2	1
1/3/2021	5	3	5

I am attempting to create a new data frame showing totals for a, b, and c (which go on for a while), and don't need the dates. I want to make a data frame that would look this:

letter	total
a	11
b	8
c	8

So far, the closest I have got to this is by writing a pipe like this:

dataframe <- totals %>%
   summarize(total_a = sum(a), total_b = sum(b), total_c = sum(c))

which almost gives me what I want, a data frame that looks like this: |a|b|c| |:-:|:-:|:-:| |11|8|8|

Is there a way (besides manually typing out a new data frame for totals) to quickly turn my totals table into the format I'm looking for? Or is there a better way to write the pipe that will give me the table I want? I want to use these totals to make a pie chart but am running into problems when I attempt to make a pie chart out of the table like I have it now. I really appreciate any help in advance and hope I was able to explain what I'm trying to do correctly.

CodePudding user response：

Try this :

totals %>% select(a:c) %>% colSums() %>% as.list() %>% as_tibble() %>% 
  pivot_longer(everything(), names_to = "letter", values_to = "total")

Actually totals %>% select(a:c) %>% colSums() gives what you need as a named vector and the next steps are to turn that into a tibble again. You can skip that part if you don't need it.

CodePudding user response：

One efficient way is to use colSums from base R, where we get the sums of each column, excluding the date column (hence the reason for the -1 in df[,1]. Then, I use stack to put into long format. The [,2:1] is just changing the order of the column output, so that letter is first and total is second. I wrap this in setNames to rename the column names.

setNames(nm=c("letter", "total"),stack(colSums(df[,-1]))[,2:1])

  letter total
1      a    11
2      b     8
3      c     8

Or with tidyverse, we can get the sum of every column, except for date. Then, we can put it into long format using pivot_longer.

df %>% 
  summarise(across(-date, sum)) %>% 
  pivot_longer(everything(), names_to = "letter", values_to = "total")

Or another option using data.table:

library(data.table)
dt <- as.data.table(df)

melt(dt[,-1][, lapply(.SD, sum)], id.vars=integer(), variable.name = "letter", value.name = "total")

Data

df <- structure(list(date = c("1/1/2021", "1/2/2021", "1/3/2021"), 
                     a = c(4L, 2L, 5L), b = c(3L, 2L, 3L), c = c(2L, 1L, 5L)), 
                class = "data.frame", row.names = c(NA, -3L))