Total R beginner here, looking for the quickest / most sensible way to do this:
I have a data frame that looks similar to this (but much longer):
dataframe:
date | a | b | c |
---|---|---|---|
1/1/2021 | 4 | 3 | 2 |
1/2/2021 | 2 | 2 | 1 |
1/3/2021 | 5 | 3 | 5 |
I am attempting to create a new data frame showing totals for a, b, and c (which go on for a while), and don't need the dates. I want to make a data frame that would look this:
letter | total |
---|---|
a | 11 |
b | 8 |
c | 8 |
So far, the closest I have got to this is by writing a pipe like this:
dataframe <- totals %>%
summarize(total_a = sum(a), total_b = sum(b), total_c = sum(c))
which almost gives me what I want, a data frame that looks like this: |a|b|c| |:-:|:-:|:-:| |11|8|8|
Is there a way (besides manually typing out a new data frame for totals) to quickly turn my totals table into the format I'm looking for? Or is there a better way to write the pipe that will give me the table I want? I want to use these totals to make a pie chart but am running into problems when I attempt to make a pie chart out of the table like I have it now. I really appreciate any help in advance and hope I was able to explain what I'm trying to do correctly.
CodePudding user response:
Try this :
totals %>% select(a:c) %>% colSums() %>% as.list() %>% as_tibble() %>%
pivot_longer(everything(), names_to = "letter", values_to = "total")
Actually totals %>% select(a:c) %>% colSums()
gives what you need as a named vector and the next steps are to turn that into a tibble
again. You can skip that part if you don't need it.
CodePudding user response:
One efficient way is to use colSums
from base R, where we get the sums of each column, excluding the date
column (hence the reason for the -1
in df[,1]
. Then, I use stack
to put into long format. The [,2:1]
is just changing the order of the column output, so that letter
is first and total
is second. I wrap this in setNames
to rename the column names.
setNames(nm=c("letter", "total"),stack(colSums(df[,-1]))[,2:1])
letter total
1 a 11
2 b 8
3 c 8
Or with tidyverse
, we can get the sum of every column, except for date
. Then, we can put it into long format using pivot_longer
.
df %>%
summarise(across(-date, sum)) %>%
pivot_longer(everything(), names_to = "letter", values_to = "total")
Or another option using data.table
:
library(data.table)
dt <- as.data.table(df)
melt(dt[,-1][, lapply(.SD, sum)], id.vars=integer(), variable.name = "letter", value.name = "total")
Data
df <- structure(list(date = c("1/1/2021", "1/2/2021", "1/3/2021"),
a = c(4L, 2L, 5L), b = c(3L, 2L, 3L), c = c(2L, 1L, 5L)),
class = "data.frame", row.names = c(NA, -3L))