How to create new column by summarizing given columns with contains '_sales'?
The code as below: the result of md %>% rowwise()%>% mutate(sub_total=across(contains("_sales"),sum))
isn't what i want.
md$sub_total <- md$a_sales md$b_sales md$d_sales
can work,but a little complicated when the given columns are more than current
ori_data <- data.frame(a_sales=c(1:5),
tsalses=c(1:5),
b_sales=c(7:11),
d_sales=c(1:5))
# this can't work
md %>% rowwise()%>% mutate(sub_total=across(contains("_sales"),sum))
# this can work, but the code is little boring
md$sub_total <- md$a_sales md$b_sales md$d_sales
CodePudding user response:
The issue is with how you are using across. Right now what you are saying is that for every column containing "_sales" you want to do a sum. What you actually want is to sum these columns together.
Instead of using rowwise()
and sum()
we can simply use rowSums()
:
ori_data %>%
mutate(sub_total = rowSums(across(contains("_sales"))))
CodePudding user response:
@Wietse's answer gives you a solution based on your data in your current format.
It is also true to say that one reason why tidyverse function's aren't easy to use in this case is that your data frame is not tidy - because your column names contain information that you need to use in your summary.
To make your data tidy, you need to pivot your data to make it longer. Then the tidyverse functions become easier to use. Here's some code to tidy your data and calculate your sub_total all in one step.
md %>%
full_join(
md %>%
pivot_longer(ends_with("_sales"), names_to="Variable", values_to="Value") %>%
group_by(tsalses) %>%
summarise(sub_total=sum(Value), .groups="drop"),
by="tsalses"
)
a_sales tsalses b_sales d_sales sub_total
1 1 1 7 1 9
2 2 2 8 2 12
3 3 3 9 3 15
4 4 4 10 4 18
5 5 5 11 5 21
If this problem is part of a larger workflow, it might be worth investing time to make your data tidy before you start. If you do so, your life will almost certainly be easier in the long run.