Home > other >  Count missing values per column in dataframe Julia
Count missing values per column in dataframe Julia

Time:12-20

I would like to count the number of missing values per column in a dataframe like df:

Pkg.add("DataFrames")
using DataFrames
df = DataFrame(i=1:5,
               x=[missing, 4, missing, 2, 1],
               y=[missing, missing, "c", "d", "e"])

5×3 DataFrame
 Row │ i      x        y       
     │ Int64  Int64?   String? 
─────┼─────────────────────────
   1 │     1  missing  missing 
   2 │     2        4  missing 
   3 │     3  missing  c
   4 │     4        2  d
   5 │     5        1  e

This should return 0 for i, 2 for x and 2 for y column. So I was wondering if anyone knows how to count the number of missing values per column in Julia?

CodePudding user response:

When writing the question I found an answer by using describe with :nmissing like this:

describe(df, :nmissing)
3×2 DataFrame
 Row │ variable  nmissing 
     │ Symbol    Int64    
─────┼────────────────────
   1 │ i                0
   2 │ x                2
   3 │ y                2

CodePudding user response:

If you wanted the output in columnar format you can write:

julia> mapcols(x -> count(ismissing, x), df)
1×3 DataFrame
 Row │ i      x      y
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     0      2      2
  • Related