Home > OS >  Looking for an efficient way of making a new data frame of totals across categories in R
Looking for an efficient way of making a new data frame of totals across categories in R

Time:02-14

Total R beginner here, looking for the quickest / most sensible way to do this:

I have a data frame that looks similar to this (but much longer):

dataframe:

date a b c
1/1/2021 4 3 2
1/2/2021 2 2 1
1/3/2021 5 3 5

I am attempting to create a new data frame showing totals for a, b, and c (which go on for a while), and don't need the dates. I want to make a data frame that would look this:

letter total
a 11
b 8
c 8

So far, the closest I have got to this is by writing a pipe like this:

dataframe <- totals %>%
   summarize(total_a = sum(a), total_b = sum(b), total_c = sum(c))

which almost gives me what I want, a data frame that looks like this: |a|b|c| |:-:|:-:|:-:| |11|8|8|

Is there a way (besides manually typing out a new data frame for totals) to quickly turn my totals table into the format I'm looking for? Or is there a better way to write the pipe that will give me the table I want? I want to use these totals to make a pie chart but am running into problems when I attempt to make a pie chart out of the table like I have it now. I really appreciate any help in advance and hope I was able to explain what I'm trying to do correctly.

CodePudding user response:

Try this :

totals %>% select(a:c) %>% colSums() %>% as.list() %>% as_tibble() %>% 
  pivot_longer(everything(), names_to = "letter", values_to = "total")

Actually totals %>% select(a:c) %>% colSums() gives what you need as a named vector and the next steps are to turn that into a tibble again. You can skip that part if you don't need it.

CodePudding user response:

One efficient way is to use colSums from base R, where we get the sums of each column, excluding the date column (hence the reason for the -1 in df[,1]. Then, I use stack to put into long format. The [,2:1] is just changing the order of the column output, so that letter is first and total is second. I wrap this in setNames to rename the column names.

setNames(nm=c("letter", "total"),stack(colSums(df[,-1]))[,2:1])

  letter total
1      a    11
2      b     8
3      c     8

Or with tidyverse, we can get the sum of every column, except for date. Then, we can put it into long format using pivot_longer.

df %>% 
  summarise(across(-date, sum)) %>% 
  pivot_longer(everything(), names_to = "letter", values_to = "total")

Or another option using data.table:

library(data.table)
dt <- as.data.table(df)

melt(dt[,-1][, lapply(.SD, sum)], id.vars=integer(), variable.name = "letter", value.name = "total")

Data

df <- structure(list(date = c("1/1/2021", "1/2/2021", "1/3/2021"), 
                     a = c(4L, 2L, 5L), b = c(3L, 2L, 3L), c = c(2L, 1L, 5L)), 
                class = "data.frame", row.names = c(NA, -3L))
  • Related