Home > OS >  Count specific value across multiple columns and divide total values per column with excluding '
Count specific value across multiple columns and divide total values per column with excluding '

Time:08-17

I am working data set consisting of multiple columns with 0,1,2 value. I am trying to compute the number of times either 1 or 2 occurrences across multiple columns and finally would like to divide the total number of times for both 1 and 2.

Below is the data subset:

input

enter image description here

expected result

enter image description here

sorry for inconvenience! as i was new to this

  df <- structure(list(Pool = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L,2L, 2L, 2L, 2L, 2L), a = c(2L, 2L, 2L, 2L, 2L, 2L, 2L,2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L), b = c(2L, 2L, 2L,2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), c = c(0L,0L, 2L, 0L, 1L, 2L, 
    0L, 2L, 2L, 0L, 2L, 2L, 2L, 2L, 2L), d = c(0L,0L, 2L, 0L, 2L, 
    2L, 0L, 2L, 2L, 0L, 2L, 2L, 2L, 2L, 2L), e = c(2L,2L, 2L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L), f = c(2L,2L, 2L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L), g = c(2L,1L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), h =c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L)),row.names = c(NA,15L), class = "data.frame")

If anyone could help for solution, Thank you for your time

CodePudding user response:

You could try this base R approach, though I am sure there are more elegant solutions. From your desired output, it looks like you want to calculate the proportion of 2's out of all 1's and 2's.

Data

Reproducible example data, which from your screenshot seems to be heavily favored for a value of 2

set.seed(123)
df <- data.frame(ID = rep(1:3, each = 5),
                 matrix(sample(0:2, 120, replace = TRUE, prob = c(20/120, 8/120, 92/120)), ncol = 8))
colnames(df)[-1] <- letters[1:8]colnames(df)[-1] <- letters[1:8]

head(df, 5)
#   ID a b c d e f g h
# 1  1 2 0 1 2 2 2 2 0
# 2  1 0 2 0 2 2 2 2 0
# 3  1 2 2 2 2 2 2 2 2
# 4  1 0 2 0 2 2 2 2 2
# 5  1 1 1 2 0 0 2 2 2

Note the numbers differ from your question since the data were provided as a screenshot and this code created data at random.

Code

Splits the data frame by group then applies a function to tabulate and divide the data (a) then recombine everything into a final data frame (b). Also checks to see if there are any values of 2 at all, else returns NA.

a <- lapply(split(df[, -1], df$ID), function(x)
  lapply(x, function(y) {
    t <- data.frame(table(y))
    if (any(t$y == 2)) {
      round(t[t$y == 2, "Freq"] / sum(t[t$y %in% 1:2, "Freq"]), 1)
    }
    else {
      NA
    }
  }))

b <- data.frame(ID = unique(df$ID), do.call(rbind, a))

Output:

# > b
#   ID    a    b    c d e    f   g   h
# 1  1 0.67 0.75 0.67 1 1    1   1   1
# 2  2    1 0.75    1 1 1    1   1 0.6
# 3  3  0.8    1    1 1 1 0.67 0.8 0.8

CodePudding user response:

You could try

library(dplyr)

df %>%
  group_by(id) %>%
  summarise(across(everything(), ~ mean(.x[.x != 0] == 2)))
  • Related