R divide the value of a variable by the number of its occurrences-CodePudding

I have a dataframe where numbers for the variable value are the total funding amounts of specific programs that apply throughout a certain region. Because the funding amounts are reported as totals, the individual amount received by an area cannot be determined, and the total amount is therefore displayed for each region covered by the program. I therefore need to average the amount give to each region based on the number of regions among which it is divided. How can I do this for each program?

My dataframe looks like the following:

program region value                                                                        
a        01    100
b        02    250
b        03    250
b        04    250
c        01    200
c        03    200
d        02    600
e        01    700
f        01    100
f        04    100

The desired output would therefore be the following:

program region value new_value                                                                      
a        01    100     100
b        02    250     83.333
b        03    250     83.333
b        04    250     83.333
c        01    200     100
c        03    200     100
d        02    600     600
e        01    700     700
f        01    100     50
f        04    100     50

CodePudding user response：

If df is your original dataframe

library(tidyverse)

df %>%
  group_by(program) %>%
  mutate(new_value = first(value) / n())

CodePudding user response：

Here's an R base solution

> df$new_value <- with(df, ave(value, program, FUN= function(x) x/length(x)))
> df
   program region value new_value
1        a      1   100 100.00000
2        b      2   250  83.33333
3        b      3   250  83.33333
4        b      4   250  83.33333
5        c      1   200 100.00000
6        c      3   200 100.00000
7        d      2   600 600.00000
8        e      1   700 700.00000
9        f      1   100  50.00000
10       f      4   100  50.00000

CodePudding user response：

Using data.table

library(data.table)
setDT(df)[, new_value := first(value)/.N, program]

-output

> df
    program region value new_value
     <char>  <int> <int>     <num>
 1:       a      1   100 100.00000
 2:       b      2   250  83.33333
 3:       b      3   250  83.33333
 4:       b      4   250  83.33333
 5:       c      1   200 100.00000
 6:       c      3   200 100.00000
 7:       d      2   600 600.00000
 8:       e      1   700 700.00000
 9:       f      1   100  50.00000
10:       f      4   100  50.00000

data

df <- structure(list(program = c("a", "b", "b", "b", "c", "c", "d", 
"e", "f", "f"), region = c(1L, 2L, 3L, 4L, 1L, 3L, 2L, 1L, 1L, 
4L), value = c(100L, 250L, 250L, 250L, 200L, 200L, 600L, 700L, 
100L, 100L)), class = "data.frame", row.names = c(NA, -10L))