Home > Mobile >  how to add row by specific value
how to add row by specific value

Time:10-03

For example, I have a dataset called data, and the column names are date min max avg. The total number of rows is 366.

I want to add the each seven rows to get the total value of min. e.g. 1-7 8-14. How can I do this.

CodePudding user response:

If you create a grouping column which increments after every 7 days you may apply all the answers from How to sum a variable by group .

Here's how you can do it in base R.

set.seed(123)
df <- data.frame(Date = Sys.Date() - 365:0, min = rnorm(366), max = runif(366))
df$group <- ceiling(seq(nrow(df))/7)
aggregate(min~group, df, sum)

#   group        min
#1      1  3.1438325
#2      2 -0.3022263
#3      3 -1.0769539
#4      4 -3.2934430
#5      5  2.8419110
#...

CodePudding user response:

This is a solution based on {tidyverse}, in particular using {dplyr} for the main operations and {lubridate} for formatting your dates.

First simulate some data - as you have not provided a reproducible dataset. I take the year 2020 which has 366 days ... obviously adapt this to your problem. For the min, max, and average values (columns) , let's generate some random numbers. Again, adapt this to your needs.

library(dplyr)  # for general data frame crunching
library(lubridate) # to coerce date-time

data <- data.frame(
   date = seq(from = lubridate::ymd("2020-01-01")
              , to = lubridate::ymd("2020-12-31"), by = 1)
  , min = sample(x = 1:10, size = 366, replace = TRUE)
  , max = sample(x = 10:15, size = 366, replace = TRUE)) %>% 
  dplyr::mutate(avg = mean(min   max))    

To group your data, inject a binning / grouping variable. The following is a generic "every 7th row" based on the modulo operator. If you want to group by weeks, etc. check out the {lubridate} documentation. You can get some useful bits out of dates for this. Or insert any other binning you need.

data <- data %>% 
  mutate(bin = c(0, rep(1:(nrow(data)-1)%/%7)))

This yields:

> data
# A tibble: 366 x 5
# Groups:   bin [53]
   date         min   max   avg   bin
   <date>     <int> <int> <dbl> <dbl>
 1 2020-01-01     2    11  18.2     0
 2 2020-01-02     6    14  18.2     0
 3 2020-01-03     7    13  18.2     0
 4 2020-01-04     6    15  18.2     0
 5 2020-01-05     3    10  18.2     0
 6 2020-01-06     5    12  18.2     0
 7 2020-01-07     5    12  18.2     0
 8 2020-01-08     7    13  18.2     1
 9 2020-01-09     8    11  18.2     1
10 2020-01-10     5    10  18.2     1

We can now summarise our grouped data. For this you use the bin-variable to group your data, and then summarise to perform aggregations on these groups. Based on your question, the following sums the min-values. Put the function/summary you need:

data %>% 
   group_by(bin) %>% 
   summarise(tot_min = sum(min))
# A tibble: 53 x 2
     bin tot_min
   <dbl>   <int>
 1     0      34
 2     1      31
 3     2      35
 4     3      44
 5     4      40
 6     5      50
 7     6      46
 8     7      38
 9     8      33
10     9      21
# ... with 43 more rows

Assign the result to your liking or whatever type of output you need. If you want to combine this with your original data dataframe, read up on bind_rows().

  •  Tags:  
  • r
  • Related