Home > Software engineering >  Vectorised calculation
Vectorised calculation

Time:03-04

I have data in a wide format containing a Category column listing types of transport and then columns with the name of that type of transport and totals.

I want to create the Calc column where each row is summed across the columns but the value for where the category and column name is the same is excluded.

So for the row total of Car, the sum would be train bus. The row total of Train would be Car Bus.

If there is a type of transport in the Category column which isnt listed as a column name, then there should be a NA in the Calc column.

The dataframe is as below, with the Calc column with the results added as expected.

Category<-c("Car","Train","Bus","Bicycle")
Car<-c(9,15,25,5)
Train<-c(8,22,1,7)
Bus<-c(5,2,4,8)
Calc<-c(13, 17,26,NA)
df<-data.frame(Category,Car,Train,Bus,Calc, stringsAsFactors = FALSE)

Can anyone suggest how to add the Calc column as per above? Ideally a vectorised calculation without a loop.

CodePudding user response:

Here is a tidyverse solution:

df<-data.frame(Category,Car,Train,Bus, stringsAsFactors = FALSE)

library(dplyr)
library(tidyr)
df |>
    pivot_longer(cols = !Category,
                 names_to = "cat2",
                 values_to = "value") |>
    group_by(Category) |>
    mutate(value = case_when((Category %in% cat2) ~ value,
                             TRUE ~ NA_real_)) |>
    filter(cat2 != Category) |>
    summarize(Calc = sum(value))  |>
    left_join(df)



# A tibble: 4 × 5
  Category  Calc   Car Train   Bus
  <chr>    <dbl> <dbl> <dbl> <dbl>
1 Bicycle     NA     5     7     8
2 Bus         26    25     1     4
3 Car         13     9     8     5
4 Train       17    15    22     2

CodePudding user response:

Using rowSums and a matrix for indexing.

# Example data
Category <- c("Car","Train","Bus","Bicycle")
Car <- c(9,15,25,5)
Train <- c(8,22,1,7)
Bus <- c(5,2,4,8)
df <- data.frame(Category,Car,Train,Bus, stringsAsFactors = FALSE)

# add the "Calc" column
df$Calc <- rowSums(df[,2:4]) - df[,2:4][matrix(c(1:nrow(df), match(df$Category, colnames(df)[2:4])), ncol = 2)]
df
#>   Category Car Train Bus Calc
#> 1      Car   9     8   5   13
#> 2    Train  15    22   2   17
#> 3      Bus  25     1   4   26
#> 4  Bicycle   5     7   8   NA
  •  Tags:  
  • r
  • Related