Home > Mobile >  Iteration of operations on several columns with R
Iteration of operations on several columns with R

Time:12-08

I have a database with information on the population of each racial group that inhabits different neighborhoods, like this one:

Neighborhood Race1 Race2 Race3
  <chr>        <dbl> <dbl> <dbl>
1 A               19     1    16
2 B               20    17     2
3 C               15     2     7
4 D               17     4    15
5 E               15    20    14

So, I want to calculate the percentage of Race1, 2 and 3 for any neighborhood. To do so, I use this code:

#Firstly, I calculate the total of population by neighborhood.
df$Total=df$Race1 df$Race2 df$Race3
#Then, start the calculation of percentages
df$Race1_pr=df$Race1/df$Total
df$Race2_pr=df$Race2/df$Total
df$Race3_pr=df$Race3/df$Total
#And I obtain this:
head(df)
  Neighborhood Race1 Race2 Race3 Total Race1_pr Race2_pr Race3_pr
  <chr>        <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>    <dbl>
1 A               19     1    16    36    0.528   0.0278   0.444 
2 B               20    17     2    39    0.513   0.436    0.0513
3 C               15     2     7    24    0.625   0.0833   0.292 
4 D               17     4    15    36    0.472   0.111    0.417 
5 E               15    20    14    49    0.306   0.408    0.286 

The result is correct, but if, for example, I were to analyze 100 races, the code I am using now would be very tedious, both to calculate the total population and the percentages. I want to know if there is a way to iterate the process I am currently following with my code.

CodePudding user response:

A possible solution, edited to incorporate the suggestion, offered by @stefan, to use rowSums:

library(tidyverse)

df <- data.frame(
  stringsAsFactors = FALSE,
  Neighborhood = c("A", "B", "C", "D", "E"),
  Race1 = c(19L, 20L, 15L, 17L, 15L),
  Race2 = c(1L, 17L, 2L, 4L, 20L),
  Race3 = c(16L, 2L, 7L, 15L, 14L)
)

df %>% 
  mutate(Total = rowSums(across(starts_with("Race"))),
         across(starts_with("Race"), ~ ./Total, .names = "{.col}_pr"))

#>   Neighborhood Race1 Race2 Race3 Total  Race1_pr   Race2_pr   Race3_pr
#> 1            A    19     1    16    36 0.5277778 0.02777778 0.44444444
#> 2            B    20    17     2    39 0.5128205 0.43589744 0.05128205
#> 3            C    15     2     7    24 0.6250000 0.08333333 0.29166667
#> 4            D    17     4    15    36 0.4722222 0.11111111 0.41666667
#> 5            E    15    20    14    49 0.3061224 0.40816327 0.28571429

CodePudding user response:

Traditional R solution. Obviously the most efficient solution will use rowSums

 df <- cbind(df,   # `cbind` aligns columns side-by-side
                # Now just work on the columns that have "Race" in them
             df[grepl("Race", names(df))]/ rowSums( df[grepl("Race", names(df))]) )
 df

Might need to do some work on the names of the new columns. The setNames function might be useful in that task.

  •  Tags:  
  • r
  • Related