I have a dataframe where numbers for the variable value
are the total funding amounts of specific programs that apply throughout a certain region. Because the funding amounts are reported as totals, the individual amount received by an area cannot be determined, and the total amount is therefore displayed for each region covered by the program. I therefore need to average the amount give to each region based on the number of regions among which it is divided. How can I do this for each program?
My dataframe looks like the following:
program region value
a 01 100
b 02 250
b 03 250
b 04 250
c 01 200
c 03 200
d 02 600
e 01 700
f 01 100
f 04 100
The desired output would therefore be the following:
program region value new_value
a 01 100 100
b 02 250 83.333
b 03 250 83.333
b 04 250 83.333
c 01 200 100
c 03 200 100
d 02 600 600
e 01 700 700
f 01 100 50
f 04 100 50
CodePudding user response:
If df is your original dataframe
library(tidyverse)
df %>%
group_by(program) %>%
mutate(new_value = first(value) / n())
CodePudding user response:
Here's an R base solution
> df$new_value <- with(df, ave(value, program, FUN= function(x) x/length(x)))
> df
program region value new_value
1 a 1 100 100.00000
2 b 2 250 83.33333
3 b 3 250 83.33333
4 b 4 250 83.33333
5 c 1 200 100.00000
6 c 3 200 100.00000
7 d 2 600 600.00000
8 e 1 700 700.00000
9 f 1 100 50.00000
10 f 4 100 50.00000
CodePudding user response:
Using data.table
library(data.table)
setDT(df)[, new_value := first(value)/.N, program]
-output
> df
program region value new_value
<char> <int> <int> <num>
1: a 1 100 100.00000
2: b 2 250 83.33333
3: b 3 250 83.33333
4: b 4 250 83.33333
5: c 1 200 100.00000
6: c 3 200 100.00000
7: d 2 600 600.00000
8: e 1 700 700.00000
9: f 1 100 50.00000
10: f 4 100 50.00000
data
df <- structure(list(program = c("a", "b", "b", "b", "c", "c", "d",
"e", "f", "f"), region = c(1L, 2L, 3L, 4L, 1L, 3L, 2L, 1L, 1L,
4L), value = c(100L, 250L, 250L, 250L, 200L, 200L, 600L, 700L,
100L, 100L)), class = "data.frame", row.names = c(NA, -10L))