Home > database >  How to get proportion of specific value across columns?
How to get proportion of specific value across columns?

Time:02-16

I have a sample dataframe as below:

   self race1 race2 race3 race4
    1    1      2      2    1
    2    1      1      1    1
    3    1      3      1    1
    4    2      1      3    1

I would like to get the proportion of 1s in the race columns as a new column. So for each row, I would count the number of 1 and divide it by 4. The desired output dataframe would look like below.

   self race1 race2 race3 race4 prop_race_as1
    1    1      2      2    1     2/4
    2    1      1      1    1     4/4
    3    1      3      1    1     3/4
    4    2      1      3    1     2/4

How do I write a function that incorporate rowwise() to get the desired output?

CodePudding user response:

Assuming your data is in df, you can get ratios as

ratios <- apply(data.matrix(df)[,-1], 1, function(x) length(which(x == 1)) / (ncol(df)-1))

then cbind(df, ratios).

CodePudding user response:

Please find below two possibilities.

Reprex

1. With dplyr (and rowwise())

  • Code
library(dplyr)

df %>% 
  dplyr::rowwise() %>% 
  dplyr::mutate(prop_race_as1 = sum(c_across(starts_with("race")) < 2) / 4)
  • Output
#> # A tibble: 4 x 6
#> # Rowwise: 
#>    self race1 race2 race3 race4 prop_race_as1
#>   <int> <int> <int> <int> <int>         <dbl>
#> 1     1     1     2     2     1          0.5 
#> 2     2     1     1     1     1          1   
#> 3     3     1     3     1     1          0.75
#> 4     4     2     1     3     1          0.5

2. Using only base R

  • Code
df$prop_race_as1 <- rowSums(df[startsWith(names(df), "race")] < 2) / 4
  • Output
df
#>   self race1 race2 race3 race4 prop_race_as1
#> 1    1     1     2     2     1          0.50
#> 2    2     1     1     1     1          1.00
#> 3    3     1     3     1     1          0.75
#> 4    4     2     1     3     1          0.50

Data

df <- structure(list(self = 1:4, race1 = c(1L, 1L, 1L, 2L), race2 = c(2L, 
1L, 3L, 1L), race3 = c(2L, 1L, 1L, 3L), race4 = c(1L, 1L, 1L, 
1L)), class = "data.frame", row.names = c(NA, -4L))

Created on 2022-02-16 by the reprex package (v2.0.1)

  • Related