Home > front end >  R: Counting Overall Percentage of 0's in Data
R: Counting Overall Percentage of 0's in Data

Time:12-28

I am working with the R programming language.

In the following link (https://www.geeksforgeeks.org/how-to-find-the-percentage-of-missing-values-in-a-dataframe-in-r/), I found out a method to calculate the total percentage of NA's in a data frame :

# declaring a data frame in R
data_frame = data.frame(C1= c(1, 2, NA, 0),
                        C2= c( NA, NA, 3, 8), 
                        C3= c("A", "V", "j", "y"),
                        C4=c(NA,NA,NA,NA))
  
percentage = mean(is.na(data_frame)) * 100

[1] 43.75

My Question: Is there a way to extend this to count the percentage of "any element" in the data frame?

For example, can this be used to calculate the percentage of 0's in the data set? Or the percentage of times "j" appears in the data? Or the percentage of times "2" appears in the data set?

I can do this manually:

# count percentage of "j" in the data 

v1 = nrow(subset(data_frame, C1 == "j")) 
v2 = nrow(subset(data_frame, C2 == "j"))
v3 = nrow(subset(data_frame, C3== "j")) 
v4 = nrow(subset(data_frame, C4 == "j"))

percentage = ((v1   v2   v3   v4) / ((nrow(data_frame) * ncol(data_frame)))) * 100

[1] 6.25

# count percentage of "0" in the data  (I don't think this is right, it should be written as "nrow(subset(data_frame, C1 <= 0))"?)

v1 = nrow(subset(data_frame, C1 = 0)) 
v2 = nrow(subset(data_frame, C2 = 0))
v3 = nrow(subset(data_frame, C3= 0)) 
v4 = nrow(subset(data_frame, C4 = 0))

percentage = ((v1   v2   v3   v4) / ((nrow(data_frame) * ncol(data_frame)))) * 100

But is there a faster way to do this?

Thanks!

CodePudding user response:

You can try to unlist your data frame into a vector

vec = unlist(data_frame)

mean(vec %in% "j") * 100 # 6.25
mean(vec %in% "0") * 100 # 6.25
mean(vec %in% NA)  * 100 # 43.75

CodePudding user response:

Here is a tidyverse base R solution.

library(tidyverse)

data_frame %>%
  mutate(across(everything(), ~ .x %in% "j")) %>%
  unlist() %>%
  mean() * 100

Output

[1] 6.25

Though this could easily be turned into a function.

calc <- function(df, val) {
  df %>%
    mutate(across(everything(), ~ .x %in% val)) %>%
    unlist() %>%
    mean() * 100
}

Output

calc(data_frame, "j") # 6.25
calc(data_frame, "0") # 6.25
calc(data_frame, NA) # 43.75
  • Related