Home > Software engineering >  Sum and subtract values in a dataframe
Sum and subtract values in a dataframe

Time:01-25

I have a dataframe of inputs and another with a set of combinations that I'd like to aggregate those inputs using.

It's relatively straightforward to use the function in the example below if all the combinations (in "teams") were addition. However, if I want to subtract a result (ie. team1 is the sum of Mary and Frank minus John) this becomes seemingly much more problematic.

I can't think of a straightforward way to accomplish this. Does anyone have a suggestion on what I might try?

## Inputs ##

input = structure(list(V1 = c("Team_2022", "Team_2022", "Team_2022"), V2 = c("Frank", "Mary", "John"), V3 = c("Sydney", "Sydney", "Sydney"), V4 = c(55, 76, 14)), row.names = c(NA, -3L), class = c("data.table", "data.frame"))

## Teams ##

teams = structure(list(V1 = c("team1", "team2", "team3"), V2 = c("Mary   Frank - John","Mary   John - Frank", "John   Frank - Mary")), class = "data.frame", row.names = c(NA, -3L))

## Group the inputs into one ##

all_objects = ls()
input_objects = grep("^input", all_objects, value = T)
input_test = as.data.frame(input_objects)

## Function to add (NB: this doesn't not give the desired output)##

result = function(input, teams) {
  data = teams %>%
    separate_rows(V2) %>%
    left_join(input, by = c("V2" = "V2")) %>%
    replace_na(list(V4 = 0)) %>%
    group_by(V1.x) %>% fill(V1.y, V3) %>%
    summarize(V1.y = first(V1.y),
              V2 = paste(V2, collapse = "   "),
              V3 = first(V3),
              V4 = sum(V4))
  return(data)
}
## Output ##

desired_output = structure(list(V1.x = c("team1", "team2", "team3"), V1.y = c("Team_2022", "Team_2022", "Team_2022"), V2 = c("Mary   Frank - John", "Mary   John - Frank", "John   Frank - Mary"), V3 = c("Sydney", "Sydney", "Sydney"), V4 = c(117, 35, -7)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L))

CodePudding user response:

You can accomplish this by first creating a new column in the input dataframe that indicates whether each row should be added or subtracted from the final result. You can do this by splitting the teams$V2 column into separate rows, then using the separate function to split the V2 column into separate columns for the variable names and the addition/subtraction operators. Then you can use ifelse() function to check the operator, if it's " " then return the value of V4 else return the negative of V4.

Then you can join the input dataframe and the modified teams dataframe on the variable names, and use the group_by() and summarize() functions to aggregate the values based on the team name.

Check the following sample code

# Create new column in input indicating whether to add or subtract each row
input$operator = ifelse(input$V2 %in% teams$V2 & grepl(" ", teams$V2), " ", "-")

# Split teams$V2 into separate rows
teams = teams %>% separate_rows(V2)

# Split teams$V2 into separate columns for variable names and operators
teams = teams %>% separate(V2, c("V2", "operator"), " -")

# Join input and teams on variable names and operator
data = teams %>% left_join(input, by = c("V2" = "V2", "operator" = "operator"))

# Replace NA values with 0
data = data %>% 
    replace_na(list(V4 = 0))

# Group by team name and summarize values
result = data %>% 
       group_by(V1.x) %>% 
          summarize(V1.y = first(V1.y),
          V2 = paste(V2, collapse = "   "),
          V3 = first(V3),
          V4 = sum(V4))

CodePudding user response:

I propose a different approach. Consider the variables in teams as formulas to evaluate. Then create an empty environment and evaluate them. Inspired by this.

# Inputs ##
input <- structure(list(V1 = c("Team_2022", "Team_2022", "Team_2022"), 
                       V2 = c("Frank", "Mary", "John"), 
                       V3 = c("Sydney", "Sydney", "Sydney"), 
                       V4 = c(55, 76, 14)),
                  row.names = c(NA, -3L), class = c("data.table", "data.frame"))

## Teams ##
teams <- structure(list(
  V1 = c("team1", "team2", "team3"), 
  V2 = c("Mary   Frank - John","Mary   John - Frank", "John   Frank - Mary")), 
  class = "data.frame", row.names = c(NA, -3L))

# Evaluation environment with only   and -
eval_env <- new.env(parent = emptyenv())
eval_env$` ` = ` `
eval_env$`-` = `-`

# assign names as variables in the environment
for(i in 1:nrow(input)) assign(input$V2[i], input$V4[i], envir = eval_env)

#Evaluate formulas
V4 <- sapply(teams$V2, \(x) eval(str2lang(x), eval_env))

V4
#Mary   Frank - John Mary   John - Frank John   Frank - Mary 
#                117                  35                  -7

  • Related