Home > Net >  Adding column sums to a data.table as a new row at the end
Adding column sums to a data.table as a new row at the end

Time:11-28

I am doing some operations on a data.table and getting a result. So far so good. Next, I want the result to also show the sums across some columns, but I can't get that to work.

I filter my table by rows where x1=1, and compute a metric by Group1:

dt[x1 == 1, .N, by = c("Group1")][, 
  "%" := round(N /sum(N) * 100, 0)] [
  ]

giving

   Group1 N  %
1:     2 6 40
2:     1 6 40
3:     3 2 13
4:     5 1  7

I would just like to add a row to the above table that gives the sum across all columns.

I can just do

colSums(.Last.value)

and get the answer in a in a separate console, but what if I wanted to just append a new row to the above table itself, something like:

   Group1 N  %
1:     2 6 40
2:     1 6 40
3:     3 2 13
4:     5 1  7  
ColSum: -- 15 100

CodePudding user response:

As a hacked mod to akrun's answer, here's a custom printing function that works around data.table's omission of row names.

prettyDT <- function(x, ...) {
  out <- capture.output(data.table:::print.data.table(x, ...))
  nms <- rownames(x)
  gre <- gregexpr("^([0-9] )(?=:)", out, perl = TRUE)
  newnms <- nms[as.integer(regmatches(out, gre), nms)]
  wids <- nchar(newnms)
  newnms[!is.na(wids)] <- sprintf(paste0("%", max(wids, na.rm = TRUE), "s"), newnms[!is.na(wids)])
  regmatches(out, gre)[!is.na(wids)] <- newnms[!is.na(wids)]
  pre <- strrep(" ", diff(range(wids, na.rm = TRUE)))
  out[is.na(wids)] <- paste0(pre, out[is.na(wids)])
  cat(out, sep = "\n")
}

With this, we can do:

out <- rbindlist(list(
  DT,
  DT[, c(.(Group1 = "--"), lapply(.SD, sum)), .SDcols = c("N", "%")]
))
rownames(out)[nrow(out)] <- "Colsum"
prettyDT(out)
#         Group1     N     %
#         <char> <int> <int>
#      1:      2     6    40
#      2:      1     6    40
#      3:      3     2    13
#      4:      5     1     7
# Colsum:     --    15   100

Admittedly, this is a bit of a hack, and requires explicit calling of a udf to get the desired output.


Data

DT <- setDT(structure(list(Group1 = c("2", "1", "3", "5"), N = c(6L, 6L, 2L, 1L), "%" = c(40L, 40L, 13L, 7L)), class = c("data.table", "data.frame"), row.names = c(NA, -4L)))

CodePudding user response:

Since I don't understand your sample dataset, I guess this can help solve your issue.

I would suggest that you use the janitor package to wrap up your column total or row total

See sample below

library(janitor)

set.seed(10)
df_sample<- sample(1:nrow(iris), 10)
df<-iris[df_sample, ]

#This would sum all the rows together and return total

df%>%
  select(Species,Sepal.Width, Petal.Length, Petal.Width)%>%
  adorn_totals(where = "row")

#This would sum all columns and return total

df%>%
  select(Species,Sepal.Width, Petal.Length, Petal.Width)%>%
  adorn_totals(where = "col")

I hope that this answered your question.

  • Related