I have a data frame with grouped data sets, where the values are "thereofs" of the following row. Means in the following example, the Value from "One" is included in "two" which itself is included in "three" for each Group.
df <- tribble(~Group , ~Category , ~Value ,
'A' , 'one' , 2 ,
'A' , 'two' , 10 ,
'A' , 'three' , 20 ,
'B' , 'one' , 7 ,
'B' , 'two' , 11 ,
'B' , 'three' , 19) %>%
mutate(across(.cols = c('Group', 'Category') , .fns = factor))
I now want to replace the "thereof" values by incrementals, i.e. calculate the difference between consecutive rows per group like this:
df %>% group_by(Group) %>%
mutate(Value = Value - lag(Value))
But this leaves the first value (one) as NA, where I want to keep the original value instead. I can achieve this with this code:
df %>% group_by(Group) %>%
mutate(tmp = Value - lag(Value) ,
Value = coalesce(tmp , Value)) %>%
select(-tmp)
As I have in my real data frame many columns where I need to do this calculation, this approach leaves me with many temporary columns. So is there an easier solution for this?
CodePudding user response:
You could do c(first(Value), diff(Value))
instead. To do that over multiple columns, you can then use across
:
library(dplyr)
df %>%
group_by(Group) %>%
mutate(across(Value, ~ c(first(.x), diff(.x))))
Group Category Value
<fct> <fct> <dbl>
1 A one 2
2 A two 8
3 A three 10
4 B one 7
5 B two 4
6 B three 8
CodePudding user response:
We could use coalesce
like this without an intermediate variable:
library(dplyr)
df %>%
mutate(across(.cols = c('Group', 'Category') , .fns = factor)) %>%
group_by(Group) %>%
mutate(Value = coalesce(Value - lag(Value), Value))
Group Category Value
<fct> <fct> <dbl>
1 A one 2
2 A two 8
3 A three 10
4 B one 7
5 B two 4
6 B three 8