Grouping items in certain intervals and multiplying them with their associated values under another-CodePudding

df_final %>% 
mutate(Temp_Zones = cut(max_temp, c(0, 40, 60, 80, 100, 120))) 
%>% 
  group_by(Temp_Zones) %>% 
  summarize(X=n()) %>% 
  
 save(df_final, file = "df_final.RData")

So the code above groups the values under the ''max_temp'' column based on the parameter/degrees (0-40-60-80-100-120) and next to it creates a column listing how many times each temp_zones have. The problem I have is with taking it to the next level:

Each temperature row has a quantity associated with it. For instance if the weather is 45 quantity on the same row is 200.

So imagine there are'40 to 60' Temp_zone has 108 observations. For all these 108 observations, for each row I would like to get each the matching Quantity data and then add them up (sum)

So my ultimate goal is to be able to see how many in total each temp group 40-60 //60-80 etc. has quantity wise.

Thanks in advance for your help!

CodePudding user response：

Well you're very nearly there. One thing I strongly recommend, though perhaps I'm OCD, is that anytime you utilize group & summarize you always, always, always pipe into ungroup(). It will save a lot of time and confusion going forward.

In this instance,

df_final %>% 
    mutate(Temp_Zones = cut(max_temp, c(0, 40, 60, 80, 100, 120))) %>% 
    group_by(Temp_Zones)                                           %>% 
    summarize(X=n())                                               %>% 
    # Ungroup releases the prior group_by command and allows for a return
    # to transformations.
    # group_by & summarize are for aggregations (like your n() )
    # and until you ungroup you will be in 'aggregation' mode even though
    # what you're really after here is a transformation* after an aggregation was applied
    ungroup()                                                      %>%
    mutate(desired_column = Temp_Zones * X)                        %>%
    save(df_final, file = "df_final.RData")

However, it is not entirely clear what you're after. It might make more sense for you to look into the difference between shallow and deep copies, though that theoretical knowledge is something R will do under the hood for you. This methodology would require making a copy of your current df, apply the aggregation (the count) and joining it back to the original. This seems very needlessly inefficient for how Im understanding this, but I think it might (...?) be what you need/be easier to understand.

CodePudding user response：

Assuming you have columns Temp and Quantity and you want the total quantity for each Temp_Zone. So, one row per Temp_Zone.

This uses data.table:

##
#   MRE: YOU should provide this
#        or, even better, your actual data...
#
set.seed(1)   # for reproducible example
df   <- data.frame(Temp=rnbinom(1000, mu=70, size=100), Quantity=rpois(1000, 30))
brks <- c(0, 40, 60, 80, 100, 120)
##
#   you start here
#
library(data.table)
setDT(df)[
  , Temp_Zones:=cut(Temp, brks)][
  , .(.N, Quantity=sum(Quantity)), keyby=.(Temp_Zones)]
##    Temp_Zones   N Quantity
## 1:     (0,40]   2       67
## 2:    (40,60] 209     6219
## 3:    (60,80] 608    18561
## 4:   (80,100] 178     5354
## 5:  (100,120]   3       81