Cumulatively add number between rows in order to create new columns BUT with diffent way for the fir-CodePudding

this post is related to this one: Cumulatively add number between rows in order to create new columns in R

But with other rules.

I have a dataframe such as

Sp1 start end  
A   100   1077 
B   2316  4088
B   26647 28746
B   50000 60000
C   450    789

and I would like to add two new columns where I cumulatively add previous start and end coordinates

But I do not treat the same way when the first Sp1 appear compared to the other ones in the dataframe.

for first I should always get the same columns for the first row:

Sp1 start end     new_start  new_end 
A   100   1077    100        1077

Then for the first B :

Sp1 start end     new_start  new_end 
A   100   1077    100        1077
B   2316  4088    1078       2850

where 1078 = 1077 1
where 2850 = 1078 (4088-2316)

for the other B it is different since it not the first B, so I do differently:

Sp1 start end     new_start  new_end 
A   100   1077    100        1077
B   2316  4088    1078       2850
B   26647 28746   25409      27508

where 2850 (26647-4088) = 25409
where 25409 (28746-26647) = 27508

For the Third B it is the same as the second B:

Sp1 start end     new_start  new_end 
A   100   1077    100        1077
B   2316  4088    1078       2850
B   26647 28746   25409      27508
B   50000 60000   48762      58762

where 27508 (50000-28746) = 48762
where 48762 (60000-50000) = 58762

for the C it is the first C, so I do as :

Sp1 start end     new_start  new_end 
A   100   1077    100        1077
B   2316  4088    1078       2850
B   26647 28746   25409      27508
B   50000 60000   48762      58762
C   450   789     58763      59101

where 58763 = 58762 1
where 59101 = 58762 (789-450)

Then at the end, I should get the expect result:

Sp1 start end new_start new_end A 100 1077 100 1077 B 2316 4088 1078 2850 B 26647 28746 25409 27508 B 50000 60000 48762 58762 C 450 789 58763 59101

If someone has an idea, it would be amazing !

Here is the dataframe in dput format if it can help :

structure(list(Sp1 = c("A", "B", "B", "B", "C"), start = c(100L, 
2316L, 26647L, 50000L, 450L), end = c(1077, 4088, 28746, 60000, 
789)), class = "data.frame", row.names = c(NA, -5L))

CodePudding user response：

Since you can't compute all columns at once (you need to wait for previous iterations to be able to compute the result for line i), just use a loop. The row number per Sp1 can be done at once using dplyr:

df <- df %>% group_by(Sp1) %>% mutate(sp_row = row_number()) %>% ungroup()
df$new_start <- df$new_end <- NA
df$new_start[1] <- df$start[1]
df$new_end[1] <- df$end[1]
for( i in 2:nrow(df)) {
  if(df$sp_row[i]==1) {
    df$new_start[i] <- df$new_end[i-1] 1
    df$new_end[i] <- df$new_start[i] df$end[i]-df$start[i]
  }
  if(df$sp_row[i]!=1) {
    df$new_start[i] <- df$start[i]-df$new_end[i-1]
    df$new_end[i] <- df$new_start[i] df$end[i]-df$start[i]
  }
}
# A tibble: 5 x 6
  Sp1   start   end new_start new_end sp_row
  <chr> <int> <dbl>     <dbl>   <dbl>  <int>
1 A       100  1077       100    1077      1
2 B      2316  4088      1078    2850      1
3 B     26647 28746     23797   25896      2
4 B     50000 60000     24104   34104      3
5 C       450   789     34105   34444      1

There is at least one mistake in your example btw: 50000-25896 = 29053 is wrong.