Home > Software design >  Using row indices values in matrix array to sum row values down column in a data frame in R - reprod
Using row indices values in matrix array to sum row values down column in a data frame in R - reprod

Time:03-22

I have a dataframe (df1) the contains a column of values. I also have a matrix (ma1) that contains two columns which refer to row indices in df1 (i.e., row 1, col 1 of ma1 is 4 and col 2 is 6 - these refer to row 4 and 6 of df1)

I need to be able to use the row indices in ma1 to sum the values of rows in df1 for each window. So i need the output to be sum of row 4-6, row 9 to 12, row 15 to 19 etc. - which correspond with the indices values in ma1.

I have read about rowSums but unsure how to full implement this in this example

Example: ma1 row 1 is 4 and 6, so I need output to be sum of rows 4,5,6 of df1. i.e., 3.37 0 1.19. Row 2 of ma1 is 9 and 12, so I need output to be sum of rows 9,10,11,12 of df1 etc etc

df1 <- structure(list(CO2 = c(3.37, 0, 1.19, 0.889999999999986, 5.88999999999999, 
    0.169999999999959, 3.92000000000002, 1.46000000000004, 1.23000000000002, 
    2.60000000000002, 1.39999999999998, 0, 4.35999999999996, 0.649999999999977, 
    0.149999999999977, 2.08999999999997, 4.23999999999995, 5.69, 
    0, 3.38, 1.95000000000005, 3.16999999999996, 2.82999999999998, 
    0, 1.69, 1.36000000000001, 0.669999999999959, 0.54000000000002, 
    0.529999999999973, 0.95999999999998, 0.600000000000023, 0.850000000000023, 
    0, 0.00999999999999091, 1.77999999999997, 1.98000000000002, 1.63, 
    2.74000000000001, 2.56, 3.50999999999999, 0, 0, 3.37, 0, 0.630000000000052, 
    0, 0.270000000000039, 0.769999999999982, 0.75, 1.25999999999999, 
    0, 0.689999999999998, 1.12, 0.210000000000036, 2.66000000000003, 
    3.14000000000004, 2.24000000000001, 0.620000000000005, 0.0900000000000318, 
    0)), row.names = c(NA, -60L), class = c("tbl_df", "tbl", "data.frame"
    ))

    ma1 <- structure(c(4, 6, 9, 12, 15, 19, 33, 37, 41, 54, 6, 9, 12, 15, 
    19, 24, 37, 41, 44, 60), .Dim = c(10L, 2L), .Dimnames = list(
        NULL, c("co2_start", "co2_end")))

CodePudding user response:

You can use apply to go over the rows of ma1, each time, using between(row_number(),....) to filter, and wrap in colSums(). Here, x, represents (each) row-vector of ma1

cbind(
  ma1,
  apply(ma1,1, function(x) colSums(filter(df1,between(row_number(),x[1],x[2]))))
)

Output:

      co2_start co2_end      
 [1,]         4       6  6.95
 [2,]         6       9  6.78
 [3,]         9      12  5.23
 [4,]        12      15  5.16
 [5,]        15      19 12.17
 [6,]        19      24 11.33
 [7,]        33      37  5.40
 [8,]        37      41 10.44
 [9,]        41      44  3.37
[10,]        54      60  8.96

This data.table approach will be faster

library(data.table)
ma1 = data.table(ma1)
ma1[, colSums(df1[co2_start:co2_end,]), by=1:nrow(ma1)]

as would this base R approach.. It is the filter/between from tidyverse that slows things down

cbind(
  ma1,
  apply(ma1,1, function(x) colSums(df1[x[1]:x[2],]))
)

CodePudding user response:

This approach returns only the sum of the indexes you actually want rather than every sequential pairwise index.

ma1_sum = list()
ma1_indexes = list()
df1_values_summed = list()
for(i in 1:length(ma1)){
  j = i
  if(i < length(ma1)){
  if(i %% 2 == 0){i = i   1}
  ma1_indexes[[j]] = toString(rep(ma1[i]:ma1[i 1]))
  df1_values_summed[[j]] = toString(df1$CO2[rep(ma1[i]:ma1[i 1])])
  ma1_sum[[j]] = sum(df1$CO2[rep(ma1[i]:ma1[i 1])]) 
  } else return(NULL)
}

data.frame(result = unlist(ma1_sum[seq(1, length(ma1_sum), 2)]),
           indexes = unlist(ma1_indexes[seq(1, length(ma1_sum), 2)]),
           values_summed = unlist(df1_values_summed[seq(1, length(df1_values_summed), 2)])
)

Output:

result indexes                                                            values_summed                                                                                   
    <dbl> <chr>                                                              <chr>                                                                                           
 1   6.95 4, 5, 6                                                            0.889999999999986, 5.88999999999999, 0.169999999999959                                          
 2   5.23 9, 10, 11, 12                                                      1.23000000000002, 2.60000000000002, 1.39999999999998, 0                                         
 3  12.2  15, 16, 17, 18, 19                                                 0.149999999999977, 2.08999999999997, 4.23999999999995, 5.69, 0                                  
 4   5.40 33, 34, 35, 36, 37                                                 0, 0.00999999999999091, 1.77999999999997, 1.98000000000002, 1.63                                
 5   9.07 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54             0, 0, 3.37, 0, 0.630000000000052, 0, 0.270000000000039, 0.769999999999982, 0.75, 1.259999999999…
 6   6.78 6, 7, 8, 9                                                         0.169999999999959, 3.92000000000002, 1.46000000000004, 1.23000000000002                         
 7   5.16 12, 13, 14, 15                                                     0, 4.35999999999996, 0.649999999999977, 0.149999999999977                                       
 8  11.3  19, 20, 21, 22, 23, 24                                             0, 3.38, 1.95000000000005, 3.16999999999996, 2.82999999999998, 0                                
 9  10.4  37, 38, 39, 40, 41                                                 1.63, 2.74000000000001, 2.56, 3.50999999999999, 0                                               
10  14.5  44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60 0, 0.630000000000052, 0, 0.270000000000039, 0.769999999999982, 0.75, 1.25999999999999, 0, 0.689…
  • Related