Home > Net >  Cumulatively add number between rows in order to create new columns in R
Cumulatively add number between rows in order to create new columns in R

Time:08-11

I have a dataframe such as

Sp1 start end  
A   100   1077 
B   2316  4088
B   26647 28746
C   450    789
D   23     499
D   45999  60000

and I would like to add two new columns where I cumulatively add previous start and end coordinates

for first I should always get the same columns for the first row:

Sp1 start end     new_start  new_end 
A   100   1077    100        1077

Then for the first B :

Sp1 start end     new_start  new_end 
A   100   1077    100        1077
B   2316  4088    1078       2850
  • where 1078 = 1077 1
  • where 2850 = 1078 (4088-2316)

for the other B it is the name :

Sp1 start end     new_start  new_end 
A   100   1077    100        1077
B   2316  4088    1078       2850
B   26647 28746   2851       4950
  • where 2851 = 2850 1
  • where 4950 = 4951 (28746-26647)

for the other C it is the name :

Sp1 start end     new_start  new_end 
A   100   1077    100        1077
B   2316  4088    1078       2850
B   26647 28746   2851       4950
C   450   789     4951       5290
  • where 4951 = 4950 1
  • where 5290 = 4951 (789-450)

Then at the end, I should get the expect result:

Sp1 start end     new_start  new_end 
A   100   1077    100        1077
B   2316  4088    1078       2850
B   26647 28746   2851       4950
C   450   789     4951       5290

If someone has an idea, it would be amazing !

Here is the dataframe in dput format if it can help :

structure(list(Sp1 = c("A", "B", "B", "C"), start = c(100L, 2316L, 
26647L, 450L), end = c(1077, 4088, 28746, 789)), class = "data.frame", row.names = c(NA, 
-4L))

CodePudding user response:

If you are using the tidyverse you could do this...

df %>% mutate(new_end = start[1] - 1   cumsum(end   1) - cumsum(start),
              new_start = lag(new_end, default = start[1] - 1)   1)

  Sp1 start   end new_end new_start
1   A   100  1077    1077       100
2   B  2316  4088    2850      1078
3   B 26647 28746    4950      2851
4   C   450   789    5290      4951
5   D    23   499    5767      5291
6   D 45999 60000   19769      5768

CodePudding user response:

Base R:

diff <- df$end - df$start
df$new_start <- cumsum(c(df$start[1], diff[-nrow(df)]))   0:(nrow(df) - 1L)
df$new_end <- df$new_start   diff
df
#>   Spl start   end new_start new_end
#> 1   A   100  1077       100    1077
#> 2   B  2316  4088      1078    2850
#> 3   B 26647 28746      2851    4950
#> 4   C   450   789      4951    5290
#> 5   D    23   499      5291    5767
#> 6   D 45999 60000      5768   19769

CodePudding user response:

We can try

transform(
  df,
  new_start = c(start[1], (cumsum(end - start)   start[1]   seq_along(start))[-length(start)]),
  new_end = cumsum(end - start)   start[1]   seq_along(start) - 1
)
  • Related