I have a table that is somewhat like this:
var | RC |
---|---|
distance50 | 2 |
distance20 | 4 |
precMax | 5 |
precMin | 1 |
total_prec | 8 |
travelTime | 5 |
travelTime | 2 |
I want to sum all similar type variables, resulting in something like this:
var | sum |
---|---|
dist | 6 |
prec | 14 |
trav | 7 |
Using 4 letters is enough to separate the different types. I have tried and tried but not figured it out. Could anyone please assist? I generally try to work with dplyr, so that would be preferred. The datasets are small (n<100) so speed is not required.
CodePudding user response:
Base R solution:
aggregate(
RC ~ var,
data = transform(
with(df, df[!(grepl("total", var)),]),
var = gsub("^(\\w )([A-Z0-9]\\w $)", "\\1", var)
),
FUN = sum
)
Data:
df <- structure(list(var = c("distance50", "distance20", "precMax",
"precMin", "total_prec", "travelTime", "travelTime"), RC = c(2L,
4L, 5L, 1L, 8L, 5L, 2L)), class = "data.frame", row.names = c(NA,
-7L))
CodePudding user response:
library(dplyr)
library(tidyr)
df %>%
separate(var, c("var", "b"), sep = "[_A-Z0-9]", extra = "merge") %>%
group_by(var = ifelse(b %in% var, b, var)) %>%
summarize(RC = sum(RC), .groups = "drop")
separate
var
into two columns by splitting on underscores (_
), capital lettersA-Z
or numbers0-9
.- In the
group_by
statement, if the second column can be found in the first then fill the first column. - Lastly, sum
RC
by group.
Output
var RC
<chr> <int>
1 distance 6
2 prec 14
3 travel 7