Home > OS >  Group variable and sum under condition in R
Group variable and sum under condition in R

Time:06-19

I would like to sum the value of "dollvalue" under the condition that ispurchase==1 is true, however I could not find an efficient solution. I tried solutions from other posts that all seemed somehow too complex and ended up not working. I tried to combine the plyr approach by combining group and aggregate but I get the error argument FUN is missing.

library(plyr)
returntrip <- roundtrips %>%
  group_by(id) %>%
  aggregate(purchcost = sum(dollvalue[ispurchase==1], 
                            FUN = sum)) %>%
  ungroup

Also I tried to simply agregate it and I think it almost works but I get the following error: Error in aggregate.data.frame(as.data.frame(x), ...) : arguments must have same length

I assume because the list and the data frame have not the same length. Is there any way to fix this?

returntrip <- aggregate(x = roundtrips$dollvalue[roundtrips$ispurchase==1],
      by = list(roundtrips$id),
      FUN = sum)

This is how a snippet of the dataframe looks like:

head looks like that:

                   ethamount                               dollvalue     id ispurchase             dollarcum
 1:  0.0000877963125548729991613761125535   -0.0010491659350307322180057001403952    883          1  0.000000000000000000
 2:  0.0010000000000000000208166817117217   -0.0107400000000000012817524819297432  36927          1  0.000000000000000000
 3: 75.4154000000000053205440053716301918 -804.6823180000000093059497885406017303   2637          1  0.000000000000000000
 4:  0.1066286798619889564232465772875003   -1.0662867986198896197436170041328296  72274          1  0.000000000000000000
 5:  0.0100000000000000002081668171172169   -0.1000000000000000055511151231257827  94359          1  0.010899999999999993
 6:  0.1000000000000000055511151231257827   -0.9460000000000001740829702612245455   3083          1  0.000000000000000000
 7:  1.0000000000000000000000000000000000   -9.3499999999999996447286321199499071 102645          1  0.000000000000000000
 8:  0.0000000000000000010000000000000001   -0.0000000000000000098900000000000005 117464          1  0.000000000000000000
 9:  0.0100000000000000002081668171172169   -0.1108999999999999985789145284797996  91239          1 -0.010899999999999993
10: 12.0000000000000000000000000000000000 -144.9600000000000079580786405131220818  52894          1  0.000000000000000000
11: 14.7899999999999991473487170878797770 -207.0600000000000022737367544323205948  80993          1  0.000000000000000000
12: 55.2299999999999968736119626555591822 -689.2703999999999950887286104261875153  74580          1  0.000000000000000000
13:  0.1000000000000000055511151231257827   -1.2480000000000002202682480856310576 116147          1  0.000000000000000000
14:  1.9995590000000000863167315401369706  -37.4517400699999996049882611259818077  36943          1  0.000000000000000000
15:  0.3914821535012809605724726225162158   -5.5786206873932533412130396754946560  86862          1  0.000000000000000000
16:  0.4893235858000000160217268785345368   -6.3122742568200003177025791956111789  88279          1  0.000000000000000000
17:  0.0001392130443151549901940194908789   -0.0016510667055777380248654528926977  72433          1  0.000000000000000000
18:  0.1000000000000000055511151231257827   -1.0160000000000000142108547152020037  68487          1  0.000000000000000000
19:  0.7211898100000000422227230956195854   -8.3946493884000012997148587601259351  28354          1  0.000000000000000000
20:  0.6650000000000000355271367880050093   -8.0265500000000002955857780762016773  80397          1  0.000000000000000000

Many thanks for any type of hint or solution.

CodePudding user response:

Try the following code where you subset your data with a condition:

library(dplyr)
df %>%
  group_by(id) %>%
  summarise(
    purchcost = sum(dollvalue[ispurchase == 1]), .groups = "drop")

Output:

# A tibble: 20 × 2
       id purchcost
    <int>     <dbl>
 1    883 -1.05e- 3
 2   2637 -8.05e  2
 3   3083 -9.46e- 1
 4  28354 -8.39e  0
 5  36927 -1.07e- 2
 6  36943 -3.75e  1
 7  52894 -1.45e  2
 8  68487 -1.02e  0
 9  72274 -1.07e  0
10  72433 -1.65e- 3
11  74580 -6.89e  2
12  80397 -8.03e  0
13  80993 -2.07e  2
14  86862 -5.58e  0
15  88279 -6.31e  0
16  91239 -1.11e- 1
17  94359 -1   e- 1
18 102645 -9.35e  0
19 116147 -1.25e  0
20 117464 -9.89e-18
  • Related