Home > OS >  Creating a function to run a conditional Sum in R
Creating a function to run a conditional Sum in R

Time:12-03

I have a dataframe like this:

dat<- data.frame (
                  'Ones'=c(0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0), 
                  'Thats'=c(0,5,3,6,8,4,5,6,8,3,1,3,4,5,6,7,4,3,4,5))

I have to create a function (gap1) that detects each 1 in Ones and than sums n-1, n and n 1 in Thats, with n being in the same row as 1.

For example in this dataset I have two 1.

dat<- data.frame (
  'Ones'=c(0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0), 
  'Thats'=c(0,5,3,6,8,4,5,6,8,3,1,3,4,5,6,7,4,3,4,5))
dat

This should be the output:

 Ones  Thats  gap1
 1      4     17   #(8 4 5)
 1      1      7   #(3 1 3)

I would like to extend this gap at will, for example:

   Ones  Thats      gap1       gap2            gap3 ...
   1      4         17         29  #(6 8 4 5 6)
   1      1         7           9  #(8 3 1 3 4)
   

There is another problem I have to consider: Suppose we have this data frame:

 dat<- data.frame (
   'Ones'=c(1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0), 
   'Thats'=c(0,5,3,6,8,4,5,6,8,3,1,NA,4,5,6,7,4,3,4,5))

In case there is a 1 at the beginning (or at the end), or if there is an NA, the function should use available data.

In this case, for example:

   Ones  Thats        gap1          gap2
   1         0         5 (0 5)      8     #(0 5 3)
   1         4         17 (8 4 5)   29    #(6 8 4 5 6)
   1         1         4 (3 1 NA)   16    #(8 3 1 NA 4)


 

Do you have any advice?

CodePudding user response:

Using tidyverse / collapse

For arbitrary number of lead and lags the collapse package offers a nice function flag, which has further arguments to specify columns (cols), or grouping variables g.

library(dplyr)
f <- function(df, n){
  df %>%
    collapse::flag(-n:n) %>%
    transmute(Ones, Thats, gap = rowSums(., na.rm = T) - 1) %>%
    filter(Ones == 1)
}

x <- data.frame (
  'Ones'=c(1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0), 
  'Thats'=c(0,5,3,6,8,4,5,6,8,3,1,NA,4,5,6,7,4,3,4,5))

# we can now specify how many lags to count:
f(x, 1) 
  Ones Thats gap
1    1     0   5
2    1     4  17
3    1     1   4
f(x, 2)
  Ones Thats gap
1    1     0   8
2    1     4  29
3    1     1  16

Or if you want to specify the number of gaps to compute, we can simplify the function to

f <- function(df, n){
  df %>%
    collapse::flag(-n:n) %>%
    rowSums(na.rm = T) - 1
}

x %>%
  mutate(gap1 = f(., 1),
         gap2 = f(., 2)) %>%
  filter(Ones == 1)
  Ones Thats gap1 gap2
1    1     0    5    8
2    1     4   17   29
3    1     1    4   16

Base R If you like terse functions:

f <- Vectorize(\(df, n) rowSums(collapse::flag(df, -n:n), na.rm = T) - 1, "n")
x[paste0("gap", 1:2)] <- f(x, 1:2) ; subset(x, Ones == 1)
   Ones Thats gap1 gap2
1     1     0    5    8
6     1     4   17   29
11    1     1    4   16

CodePudding user response:

With BaseR,

myfun <- function(data,gap=1) {

        points <- which(data["Ones"]==1)
        sapply(points, function(x) {

            bottom <- ifelse(x-gap<=0,1,x -gap) 
            top <- ifelse(x  gap > nrow(data),nrow(data),x  gap)

            sum(data[bottom:top,"Thats"], na.rm=T)
        })
 }

#> myfun(dat,1)
#[1]  5 17  4
#> myfun(dat,2)
#[1]  8 29 16

CodePudding user response:

Another base R solution

f <- function(dat, width = 1)
{
  dat$gaps <- sapply(seq(nrow(dat)), function(x) {
      if(dat$Ones[x] == 0) return(0)
      i <- x   seq(2 * width   1) - (width   1)
      i <- i[i > 0]
      i <- i[i < nrow(dat)]
      sum(dat$Thats[i])
    })
  dat[dat$Ones == 1,]
}

f(dat, 1)
#>    Ones Thats gaps
#> 6     1     4   17
#> 11    1     1    7

f(dat, 2)
#>    Ones Thats gaps
#> 6     1     4   29
#> 11    1     1   19
  •  Tags:  
  • r
  • Related