Home > Software design >  R Dataframe: Sum undefined num columns based on 2 other column matches
R Dataframe: Sum undefined num columns based on 2 other column matches

Time:12-22

I have a dataframe that looks like this:

Name  Fruit    CostA  CostB
Adam  Orange   2       5
Adam  Apple    3       6
Bob   Orange   3       5
Cathy Orange   4       3
Cathy Orange   5       7

dataframe creation:

df=data.frame(Name=c("Adam","Adam","Bob","Cathy","Cathy"),Fruit=c("Orange","Apple","Orange","Orange","Orange"),CostA=c(2,3,3,4,5),CostB=c(7,8,9,3,4))

This dataframe will grow by unspecified number of columns with CostC, CostD, etc. I need to sum/aggregate all columns (CostA, CostB, etc.) when Name and Fruit values match.

I have accomplished similar when the cost columns to be summed are known, by using this:

aggregate(cbind(CostA,CostB) ~ Name   Fruit, df, sum)

However, I now need to accomplish without identifying all columns to be summed.

CodePudding user response:

If these are the only columns, use . to specify the rest of the columns in the formula

aggregate(.~ Name   Fruit, df, sum)

If there are other columns as well and wants to include only the cost columns in addition to 'Name', 'Fruit', subset with select option

aggregate(.~ Name   Fruit, 
    subset(df, selct = c(Name, Fruit, startsWith(names(df), "Cost"))), sum)
  • Related