I have a dataframe like so:
id year month val
1 2020 1 50
1 2020 7 80
1 2021 1 40
1 2021 7 70
.
.
Now, I want to index all the values using Jan 2020 as index year for each id. Essentially group by id, then divide val with val at Jan 2020 * 100. So the final dataframe would look something like this:
id year month val
1 2020 1 100
1 2020 7 160
1 2021 1 80
1 2021 7 140
.
.
This is what I tried till now:
df %>% group_by(id) %>% mutate(val = 100*val/[val at Jan 2020])
I can separately get val at Jan 2020 like so:
df %>% filter(year==2020, month==1) %>% select(val)
But it doesn't work together:
df %>% group_by(id) %>% mutate(val = 100*val/(df %>% filter(year==2020, month==1) %>% select(val)))
The above throws error
CodePudding user response:
A dplyr
approach
library(dplyr)
df %>%
group_by(id) %>%
mutate(val = val / val[year == 2020 & month == 1] * 100) %>%
ungroup()
# A tibble: 4 × 4
id year month val
<int> <int> <int> <dbl>
1 1 2020 1 100
2 1 2020 7 160
3 1 2021 1 80
4 1 2021 7 140
CodePudding user response:
Base R
do.call(
rbind,
lapply(
split(df,df$id),
function(x){
cbind(
subset(x,select=-c(val)),
"val"=x$val/x$val[x$year==2020 & x$month==1]*100
)
}
)
)
id year month val
1.1 1 2020 1 100
1.2 1 2020 7 160
1.3 1 2021 1 80
1.4 1 2021 7 140