I have data in a wide format containing a Category column listing types of transport and then columns with the name of that type of transport and totals.
I want to create the Calc column where each row is summed across the columns but the value for where the category and column name is the same is excluded.
So for the row total of Car, the sum would be train bus. The row total of Train would be Car Bus.
If there is a type of transport in the Category column which isnt listed as a column name, then there should be a NA in the Calc column.
The dataframe is as below, with the Calc column with the results added as expected.
Category<-c("Car","Train","Bus","Bicycle")
Car<-c(9,15,25,5)
Train<-c(8,22,1,7)
Bus<-c(5,2,4,8)
Calc<-c(13, 17,26,NA)
df<-data.frame(Category,Car,Train,Bus,Calc, stringsAsFactors = FALSE)
Can anyone suggest how to add the Calc column as per above? Ideally a vectorised calculation without a loop.
CodePudding user response:
Here is a tidyverse solution:
df<-data.frame(Category,Car,Train,Bus, stringsAsFactors = FALSE)
library(dplyr)
library(tidyr)
df |>
pivot_longer(cols = !Category,
names_to = "cat2",
values_to = "value") |>
group_by(Category) |>
mutate(value = case_when((Category %in% cat2) ~ value,
TRUE ~ NA_real_)) |>
filter(cat2 != Category) |>
summarize(Calc = sum(value)) |>
left_join(df)
# A tibble: 4 × 5
Category Calc Car Train Bus
<chr> <dbl> <dbl> <dbl> <dbl>
1 Bicycle NA 5 7 8
2 Bus 26 25 1 4
3 Car 13 9 8 5
4 Train 17 15 22 2
CodePudding user response:
Using rowSums
and a matrix for indexing.
# Example data
Category <- c("Car","Train","Bus","Bicycle")
Car <- c(9,15,25,5)
Train <- c(8,22,1,7)
Bus <- c(5,2,4,8)
df <- data.frame(Category,Car,Train,Bus, stringsAsFactors = FALSE)
# add the "Calc" column
df$Calc <- rowSums(df[,2:4]) - df[,2:4][matrix(c(1:nrow(df), match(df$Category, colnames(df)[2:4])), ncol = 2)]
df
#> Category Car Train Bus Calc
#> 1 Car 9 8 5 13
#> 2 Train 15 22 2 17
#> 3 Bus 25 1 4 26
#> 4 Bicycle 5 7 8 NA