Home > database >  how to calculate gaps between two columns in r
how to calculate gaps between two columns in r

Time:10-14

I’m looking to calculate the gaps between a previous "end" number, with the next "start" number. Referring to the data attached, as an example, the result is in df$gap. In the first row, the number is df$gap=df$start[1]-1. the rest of result would be df$start[n]-df$end[n-1]. I can easily do this in Excel, however, I am having difficulty with figuring out how to do this in R without loop.

If anyone could provide a solution, that would be much appreciated!

df = read.table(text="start  end
   172  635
   766 1699
  1817 1891
  2015 2320", header=T)

the expected result:

  start  end  gap
   172  635   171
   766 1699   131
  1817 1891   118
  2015 2320   124

CodePudding user response:

Using dplyr this is a solution using lag

df %>% mutate(gap = start - lag(end))%>%
           mutate(gap = ifelse(row_number() == 1,start -1,gap))

Output:

    start  end gap
1   172  635 171
2   766 1699 131
3  1817 1891 118
4  2015 2320 124

CodePudding user response:

In base R:

df$gap <- df$start - c(1L, head(df$end, -1))

Gives:

df
  start  end gap
1   172  635 171
2   766 1699 131
3  1817 1891 118
4  2015 2320 124

CodePudding user response:

dplyr plus a small trick could help with that:

library(dplyr)

df = read.table(text="start  end
   172  635
   766 1699
  1817 1891
  2015 2320", header=T)

df$temp <- c(1, df$end[-length(df$end)])

mutate(df, gap = start - temp) |> select(-temp)

Output:

  start  end gap
1   172  635 171
2   766 1699 131
3  1817 1891 118
4  2015 2320 124

CodePudding user response:

One possible solution with the package data.table

Please find the reprex below.

REPREX

library(data.table)

DT <- setDT(df)

DT[, end_lead := shift(end,1)][, `:=` (gap = start - end_lead, end_lead = NULL)]

setnafill(DT, fill = DT$start[1] - 1)

DT
#>    start  end gap
#> 1:   172  635 171
#> 2:   766 1699 131
#> 3:  1817 1891 118
#> 4:  2015 2320 124

Created on 2021-10-13 by the reprex package (v0.3.0)

CodePudding user response:

If I get your question, one solution could be lag function from dplyr

For istance:

df[,'gap']  = df[,'start'] - lag(df[,"end"], n = 1)
  •  Tags:  
  • r
  • Related