I have a strangely structured data frame with a grouping variable as a factor, followed by a character variable identifying expected vs. actual values for a series of ongoing days (where the actual values are recorded each day in a new column), as well as cumulatively, and I'm trying to calculate the difference between the expected and the actual values.
However, given the structure of the data, I'm struggling to figure out how to do this without getting into a whole mess of reshaping the data only to have to put it back together like this.
Is there a way I can calculate the differences, for each group, across all 'daily' columns, and then store the output in each 'original' column?
Sample Data
df1 <- structure(list(Group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), Variable = c("Day", "Actual.Cumulative", "Actual.Daily", "Expected.Cumulative", "Expected.Daily", "Diff.Daily", "Diff.Cumulative", "Day", "Actual.Cumulative", "Actual.Daily", "Expected.Cumulative", "Expected.Daily", "Diff.Daily", "Diff.Cumulative", "Day", "Actual.Cumulative", "Actual.Daily", "Expected.Cumulative", "Expected.Daily", "Diff.Daily", "Diff.Cumulative"), Day1 = c("Sunday", "1", "1", "3", "3", NA, NA, "Sunday", "1", "1", "1", "1", NA, NA, "Sunday", "8", "8", "5", "5", NA, NA), Day2 = c("Monday", "4", "3", "7", "4", NA, NA, "Monday", "2", "1", "3", "2", NA, NA, "Monday", "17", "9", "10", "5", NA, NA), Day3 = c("Tuesday", "6", "2", "10", "3", NA, NA, "Tuesday", "3", "1", "4", "1", NA, NA, "Tuesday", "29", "12", "14", "4", NA, NA), Day4 = c("Wednesday", "11", "5", "12", "2", NA, NA, "Wednesday", "4", "1", "5", "1", NA, NA, "Wednesday", "32", "3", "17", "3", NA, NA)), row.names = c(NA, -21L), class = "data.frame")
df1
What I would like to do is fill in the Diff.Daily
cells in each "Day" column (i.e. Day1
, Day2
, etc.) by taking the difference between the Actual.Daily
and Expected.Daily
values for each Group
(A, B, C), and then do similarly for Diff.Cumulative
(i.e. Actual.Cumulative - Expected.Cumulative
).
Desired Output
df2 <- structure(list(Group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), Variable = c("Day", "Actual.Cumulative", "Actual.Daily", "Expected.Cumulative", "Expected.Daily", "Diff.Daily", "Diff.Cumulative", "Day", "Actual.Cumulative", "Actual.Daily", "Expected.Cumulative", "Expected.Daily", "Diff.Daily", "Diff.Cumulative", "Day", "Actual.Cumulative", "Actual.Daily", "Expected.Cumulative", "Expected.Daily", "Diff.Daily", "Diff.Cumulative"), Day1 = c("Sunday", "1", "1", "3", "3", "-2", "-2", "Sunday", "1", "1", "1", "1", "0", "0", "Sunday", "8", "8", "5", "5", "3", "3"), Day2 = c("Monday", "4", "3", "7", "4", "-1", "-3", "Monday", "2", "1", "3", "2", "-1", "-1", "Monday", "17", "9", "10", "5", "4", "7"), Day3 = c("Tuesday", "6", "2", "10", "3", "-1", "-4", "Tuesday", "3", "1", "4", "1", "0", "-1", "Tuesday", "29", "12", "14", "4", "8", "15"), Day4 = c("Wednesday", "11", "5", "12", "2", "3", "-1", "Wednesday", "4", "1", "5", "1", "0", "-1", "Wednesday", "32", "3", "17", "3", "0", "15")), row.names = c(NA, -21L), class = "data.frame")
df2
Here you can see for Group A on Day 1, Diff.Daily
= -2 as expected from Actual.Daily
of 1 minus an Expected.Daily
of 3.
My Attempts
As the "Day" columns are of character class because of their mixed nature, I know I need to wrap the calculations with as.numeric()
, and I can calculate the values individually for each column:
Diff.Cumulative <- as.numeric(df1[df1$Variable == "Actual.Cumulative", "Day1"]) - as.numeric(df1[df1$Variable == "Expected.Cumulative", "Day1"])
But I'm struggling with how to expand this to all "Day" columns and to fit it into the current data frame structure. I came across a similar but ultimately different question.
CodePudding user response:
Here's a way to do this with the help of a helper function.
library(dplyr)
change_values <- function(sub1, sub2, to, var, value) {
x <- suppressWarnings(as.numeric(value))
x[match(to, var)] <- x[match(sub1, var)] - x[match(sub2, var)]
x[1] <- value[1]
x
}
df1 %>%
group_by(Group) %>%
mutate(across(starts_with('Day'), ~change_values('Actual.Daily',
'Expected.Daily', 'Diff.Daily', Variable, .)),
across(starts_with('Day'), ~change_values('Actual.Cumulative',
'Expected.Cumulative', 'Diff.Cumulative', Variable, .))) %>%
ungroup
This returns -
# Group Variable Day1 Day2 Day3 Day4
#1 A Day Sunday Monday Tuesday Wednesday
#2 A Actual.Cumulative 1 4 6 11
#3 A Actual.Daily 1 3 2 5
#4 A Expected.Cumulative 3 7 10 12
#5 A Expected.Daily 3 4 3 2
#6 A Diff.Daily -2 -1 -1 3
#7 A Diff.Cumulative -2 -3 -4 -1
#8 B Day Sunday Monday Tuesday Wednesday
#9 B Actual.Cumulative 1 2 3 4
#10 B Actual.Daily 1 1 1 1
#11 B Expected.Cumulative 1 3 4 5
#12 B Expected.Daily 1 2 1 1
#13 B Diff.Daily 0 -1 0 0
#14 B Diff.Cumulative 0 -1 -1 -1
#15 C Day Sunday Monday Tuesday Wednesday
#16 C Actual.Cumulative 8 17 29 32
#17 C Actual.Daily 8 9 12 3
#18 C Expected.Cumulative 5 10 14 17
#19 C Expected.Daily 5 5 4 3
#20 C Diff.Daily 3 4 8 0
#21 C Diff.Cumulative 3 7 15 15