Home > Software design >  Count the absolute number and the percentage of NAs for every column in a dataframe
Count the absolute number and the percentage of NAs for every column in a dataframe

Time:12-15

I have the dataframe below and I want to create a new dataframe based on this with 3 columns. The first will be named "Field" and will contain the column names of this dataframe (col1,col2,col3). The second will be named "Absolute" and will contain the absolute number of missing values of this column and the third will be named Percentage and will contain the percentage of the missing values of this column. The number of columns and rows in my real dataframe is bigger.

col1<-c("as","df",NA)
col2<-c("ds",NA,NA)
col3<-c(NA,NA,NA)
df<-data.frame(col1,col2,col3)

CodePudding user response:

Try,

data.frame(field = names(df), 
           Absolute = colSums(is.na(df)), 
           Percentage = 100 * (colSums(is.na(df)) / nrow(df)),
           row.names = seq(nrow(df)))

     field Absolute Percentage
   1  col1        1   33.33333
   2  col2        2   66.66667
   3  col3        3  100.00000

CodePudding user response:

Tidyverse approach (though I prefer @Sotos’ solution):

library(dplyr)
library(tidyr)

df %>%
  summarize(across(
    everything(),
    list(
      Absolute = ~ sum(is.na(.x)), 
      Percentage = ~ mean(is.na(.x)) * 100
    )
  )) %>%
  pivot_longer(
    everything(),
    names_to = c("Field", ".value"),
    names_sep = "_"
  )
# A tibble: 3 × 3
  Field Absolute Percentage
  <chr>    <int>      <dbl>
1 col1         1       33.3
2 col2         2       66.7
3 col3         3      100  
  •  Tags:  
  • r
  • Related