I have a dataframe of inputs and another with a set of combinations that I'd like to aggregate those inputs using.
It's relatively straightforward to use the function in the example below if all the combinations (in "teams") were addition. However, if I want to subtract a result (ie. team1 is the sum of Mary and Frank minus John) this becomes seemingly much more problematic.
I can't think of a straightforward way to accomplish this. Does anyone have a suggestion on what I might try?
## Inputs ##
input = structure(list(V1 = c("Team_2022", "Team_2022", "Team_2022"), V2 = c("Frank", "Mary", "John"), V3 = c("Sydney", "Sydney", "Sydney"), V4 = c(55, 76, 14)), row.names = c(NA, -3L), class = c("data.table", "data.frame"))
## Teams ##
teams = structure(list(V1 = c("team1", "team2", "team3"), V2 = c("Mary Frank - John","Mary John - Frank", "John Frank - Mary")), class = "data.frame", row.names = c(NA, -3L))
## Group the inputs into one ##
all_objects = ls()
input_objects = grep("^input", all_objects, value = T)
input_test = as.data.frame(input_objects)
## Function to add (NB: this doesn't not give the desired output)##
result = function(input, teams) {
data = teams %>%
separate_rows(V2) %>%
left_join(input, by = c("V2" = "V2")) %>%
replace_na(list(V4 = 0)) %>%
group_by(V1.x) %>% fill(V1.y, V3) %>%
summarize(V1.y = first(V1.y),
V2 = paste(V2, collapse = " "),
V3 = first(V3),
V4 = sum(V4))
return(data)
}
## Output ##
desired_output = structure(list(V1.x = c("team1", "team2", "team3"), V1.y = c("Team_2022", "Team_2022", "Team_2022"), V2 = c("Mary Frank - John", "Mary John - Frank", "John Frank - Mary"), V3 = c("Sydney", "Sydney", "Sydney"), V4 = c(117, 35, -7)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L))
CodePudding user response:
You can accomplish this by first creating a new column in the input
dataframe that indicates whether each row should be added or subtracted from the final result. You can do this by splitting the teams$V2
column into separate rows, then using the separate function to split the V2
column into separate columns for the variable names and the addition/subtraction operators. Then you can use ifelse() function to check the operator, if it's " " then return the value of V4 else return the negative of V4.
Then you can join the input
dataframe and the modified teams
dataframe on the variable names, and use the group_by()
and summarize()
functions to aggregate the values based on the team name.
Check the following sample code
# Create new column in input indicating whether to add or subtract each row
input$operator = ifelse(input$V2 %in% teams$V2 & grepl(" ", teams$V2), " ", "-")
# Split teams$V2 into separate rows
teams = teams %>% separate_rows(V2)
# Split teams$V2 into separate columns for variable names and operators
teams = teams %>% separate(V2, c("V2", "operator"), " -")
# Join input and teams on variable names and operator
data = teams %>% left_join(input, by = c("V2" = "V2", "operator" = "operator"))
# Replace NA values with 0
data = data %>%
replace_na(list(V4 = 0))
# Group by team name and summarize values
result = data %>%
group_by(V1.x) %>%
summarize(V1.y = first(V1.y),
V2 = paste(V2, collapse = " "),
V3 = first(V3),
V4 = sum(V4))
CodePudding user response:
I propose a different approach. Consider the variables in teams as formulas to evaluate. Then create an empty environment and evaluate them. Inspired by this.
# Inputs ##
input <- structure(list(V1 = c("Team_2022", "Team_2022", "Team_2022"),
V2 = c("Frank", "Mary", "John"),
V3 = c("Sydney", "Sydney", "Sydney"),
V4 = c(55, 76, 14)),
row.names = c(NA, -3L), class = c("data.table", "data.frame"))
## Teams ##
teams <- structure(list(
V1 = c("team1", "team2", "team3"),
V2 = c("Mary Frank - John","Mary John - Frank", "John Frank - Mary")),
class = "data.frame", row.names = c(NA, -3L))
# Evaluation environment with only and -
eval_env <- new.env(parent = emptyenv())
eval_env$` ` = ` `
eval_env$`-` = `-`
# assign names as variables in the environment
for(i in 1:nrow(input)) assign(input$V2[i], input$V4[i], envir = eval_env)
#Evaluate formulas
V4 <- sapply(teams$V2, \(x) eval(str2lang(x), eval_env))
V4
#Mary Frank - John Mary John - Frank John Frank - Mary
# 117 35 -7