I have a dataframe such as
Sp1 start end
A 100 1077
B 2316 4088
B 26647 28746
C 450 789
D 23 499
D 45999 60000
and I would like to add two new columns where I cumulatively add previous start and end coordinates
for first I should always get the same columns for the first row:
Sp1 start end new_start new_end
A 100 1077 100 1077
Then for the first B :
Sp1 start end new_start new_end
A 100 1077 100 1077
B 2316 4088 1078 2850
where 1078 = 1077 1
where 2850 = 1078 (4088-2316)
for the other B it is the name :
Sp1 start end new_start new_end
A 100 1077 100 1077
B 2316 4088 1078 2850
B 26647 28746 2851 4950
- where
2851 = 2850 1
- where
4950 = 4951 (28746-26647)
for the other C it is the name :
Sp1 start end new_start new_end
A 100 1077 100 1077
B 2316 4088 1078 2850
B 26647 28746 2851 4950
C 450 789 4951 5290
- where
4951 = 4950 1
- where
5290 = 4951 (789-450)
Then at the end, I should get the expect result:
Sp1 start end new_start new_end
A 100 1077 100 1077
B 2316 4088 1078 2850
B 26647 28746 2851 4950
C 450 789 4951 5290
If someone has an idea, it would be amazing !
Here is the dataframe in dput format if it can help :
structure(list(Sp1 = c("A", "B", "B", "C"), start = c(100L, 2316L,
26647L, 450L), end = c(1077, 4088, 28746, 789)), class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
If you are using the tidyverse
you could do this...
df %>% mutate(new_end = start[1] - 1 cumsum(end 1) - cumsum(start),
new_start = lag(new_end, default = start[1] - 1) 1)
Sp1 start end new_end new_start
1 A 100 1077 1077 100
2 B 2316 4088 2850 1078
3 B 26647 28746 4950 2851
4 C 450 789 5290 4951
5 D 23 499 5767 5291
6 D 45999 60000 19769 5768
CodePudding user response:
Base R:
diff <- df$end - df$start
df$new_start <- cumsum(c(df$start[1], diff[-nrow(df)])) 0:(nrow(df) - 1L)
df$new_end <- df$new_start diff
df
#> Spl start end new_start new_end
#> 1 A 100 1077 100 1077
#> 2 B 2316 4088 1078 2850
#> 3 B 26647 28746 2851 4950
#> 4 C 450 789 4951 5290
#> 5 D 23 499 5291 5767
#> 6 D 45999 60000 5768 19769
CodePudding user response:
We can try
transform(
df,
new_start = c(start[1], (cumsum(end - start) start[1] seq_along(start))[-length(start)]),
new_end = cumsum(end - start) start[1] seq_along(start) - 1
)