Home > Software design >  Sum column based on variable name in other column that contains x following similar letters
Sum column based on variable name in other column that contains x following similar letters

Time:07-08

I have a table that is somewhat like this:

var RC
distance50 2
distance20 4
precMax 5
precMin 1
total_prec 8
travelTime 5
travelTime 2

I want to sum all similar type variables, resulting in something like this:

var sum
dist 6
prec 14
trav 7

Using 4 letters is enough to separate the different types. I have tried and tried but not figured it out. Could anyone please assist? I generally try to work with dplyr, so that would be preferred. The datasets are small (n<100) so speed is not required.

CodePudding user response:

Base R solution:

aggregate(
  RC ~ var,
  data = transform(
    with(df, df[!(grepl("total", var)),]),
    var = gsub("^(\\w )([A-Z0-9]\\w $)", "\\1", var)
  ),
  FUN = sum
)

Data:

df <- structure(list(var = c("distance50", "distance20", "precMax", 
"precMin", "total_prec", "travelTime", "travelTime"), RC = c(2L, 
4L, 5L, 1L, 8L, 5L, 2L)), class = "data.frame", row.names = c(NA, 
-7L))

CodePudding user response:

library(dplyr)
library(tidyr)

df %>% 
  separate(var, c("var", "b"), sep = "[_A-Z0-9]", extra = "merge") %>% 
  group_by(var = ifelse(b %in% var, b, var)) %>% 
  summarize(RC = sum(RC), .groups = "drop")
  1. separate var into two columns by splitting on underscores (_), capital letters A-Z or numbers 0-9.
  2. In the group_by statement, if the second column can be found in the first then fill the first column.
  3. Lastly, sum RC by group.

Output

  var         RC
  <chr>    <int>
1 distance     6
2 prec        14
3 travel       7
  • Related