I have a database with information on the population of each racial group that inhabits different neighborhoods, like this one:
Neighborhood Race1 Race2 Race3
<chr> <dbl> <dbl> <dbl>
1 A 19 1 16
2 B 20 17 2
3 C 15 2 7
4 D 17 4 15
5 E 15 20 14
So, I want to calculate the percentage of Race1, 2 and 3 for any neighborhood. To do so, I use this code:
#Firstly, I calculate the total of population by neighborhood.
df$Total=df$Race1 df$Race2 df$Race3
#Then, start the calculation of percentages
df$Race1_pr=df$Race1/df$Total
df$Race2_pr=df$Race2/df$Total
df$Race3_pr=df$Race3/df$Total
#And I obtain this:
head(df)
Neighborhood Race1 Race2 Race3 Total Race1_pr Race2_pr Race3_pr
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 19 1 16 36 0.528 0.0278 0.444
2 B 20 17 2 39 0.513 0.436 0.0513
3 C 15 2 7 24 0.625 0.0833 0.292
4 D 17 4 15 36 0.472 0.111 0.417
5 E 15 20 14 49 0.306 0.408 0.286
The result is correct, but if, for example, I were to analyze 100 races, the code I am using now would be very tedious, both to calculate the total population and the percentages. I want to know if there is a way to iterate the process I am currently following with my code.
CodePudding user response:
A possible solution, edited to incorporate the suggestion, offered by @stefan, to use rowSums
:
library(tidyverse)
df <- data.frame(
stringsAsFactors = FALSE,
Neighborhood = c("A", "B", "C", "D", "E"),
Race1 = c(19L, 20L, 15L, 17L, 15L),
Race2 = c(1L, 17L, 2L, 4L, 20L),
Race3 = c(16L, 2L, 7L, 15L, 14L)
)
df %>%
mutate(Total = rowSums(across(starts_with("Race"))),
across(starts_with("Race"), ~ ./Total, .names = "{.col}_pr"))
#> Neighborhood Race1 Race2 Race3 Total Race1_pr Race2_pr Race3_pr
#> 1 A 19 1 16 36 0.5277778 0.02777778 0.44444444
#> 2 B 20 17 2 39 0.5128205 0.43589744 0.05128205
#> 3 C 15 2 7 24 0.6250000 0.08333333 0.29166667
#> 4 D 17 4 15 36 0.4722222 0.11111111 0.41666667
#> 5 E 15 20 14 49 0.3061224 0.40816327 0.28571429
CodePudding user response:
Traditional R solution. Obviously the most efficient solution will use rowSums
df <- cbind(df, # `cbind` aligns columns side-by-side
# Now just work on the columns that have "Race" in them
df[grepl("Race", names(df))]/ rowSums( df[grepl("Race", names(df))]) )
df
Might need to do some work on the names of the new columns. The setNames function might be useful in that task.