Suppose we have the below starting data frame called "data", with generating code beneath it (note that in this example the starting data frame is static, whereas in the actual code this is used in the data frame is generated by a function where the size of the data frame (rows and columns) varies based on the actual data the function is run against):
> data
To A B C Sum
1 A 1 3 5 9
2 B 2 4 6 12
3 C 3 5 7 15
4 Sum 6 12 18 36
library(dplyr)
data <-
data.frame(
To = c("A","B","C"),
A = c(1,2,3),
B = c(3,4,5),
C = c(5,6,7)
)
data <- data %>%
replace(is.na(.), 0) %>%
bind_rows(summarise_all(., ~(if(is.numeric(.)) sum(.) else "Sum")))
data <- cbind(data, Sum = rowSums(data[,-1]))
I am trying to create a new data frame that calculates each cell value from the starting data frame "data" as a percentage of its respective column total, so we end up with a data frame that looks like this (call it "data1")(but please note the caveat below this data frame):
To A B C Sum
1 A 0.167 0.250 0.278 0.250
2 B 0.333 0.333 0.333 0.333
3 C 0.500 0.417 0.389 0.417
4 Sum 1.000 1.000 1.000 1.000
Caveat: the size of the starting data frame ("data") is variable in the actual code based on the data it is run against, and not static like in this MWE. So the percentage formula needs to reach to the value in the last row of each column to use in the denominator in the percentage calculation. Fixed references (such as data[2,4]
for first column total) won't work.
Any suggestions for an efficient way to do this? With a preference for dplyr
.
CodePudding user response:
You can do:
library(tidyverse)
data |>
mutate(across(-c(To), ~ ./.[To == "Sum"])) |>
as_tibble()
# A tibble: 4 x 5
To A B C Sum
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 0.167 0.25 0.278 0.25
2 B 0.333 0.333 0.333 0.333
3 C 0.5 0.417 0.389 0.417
4 Sum 1 1 1 1