Home > database >  The simplest way to check for NaNs in columns (R)?
The simplest way to check for NaNs in columns (R)?

Time:01-01

I'm python user learning R.

Frequently, I need to check if columns of a dataframe contain NaN(s).

In python, I can simply do

import pandas as pd
df = pd.DataFrame({'colA': [1,   2,   None, 3], 
                   'colB': ['A', 'B', 'C', 'D']})
df.isna().any()

giving me

colA   True
colB   False
dtype: bool

In R I'm struggling to find an easy solution. People refer to some apply-like methods but that seems overly complex for such a primitive task. The closest solution I've found is this:

library(tidyverse)
df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))
!complete.cases(t(df))

giving

[1] TRUE   FALSE

That's OKyish but I don't see the column names. If the dataframe has 50 columns I don't know which one has NaNs.

Is there a better R solution?

CodePudding user response:

You can use anyNA: Checks for NA in a vector

df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))
sapply(df, anyNA)

colA  colB 
TRUE FALSE 

Edit

jay.sf is right. This will check for NaNs.

df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))

anyNAN <- function(x) {
  any(is.nan(x))
}

sapply(df, anyNAN)

CodePudding user response:

The best waty to check if columns have NAs is to apply a loop to the columns with a function to check whether there is any(is.na).

lapply(df, function(x) any(is.na(x)))

$colA
[1] TRUE

$colB
[1] FALSE

I can see you load the tidyverse yet did not use it in your example. If we want to do this within the tidyverse, we can use purrr:

library(purrr)

df %>% map(~any(is.na(.x)))

Or with dplyr:

library(dplyr)

df %>% summarise(across(everything(), ~any(is.na(.x))

  colA  colB
1 TRUE FALSE

CodePudding user response:

The easiest way would be:

df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))

is.na(df)

Output:

      colA  colB
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,]  TRUE FALSE
[4,] FALSE FALSE

Update, if you only want to see the rows containing NA:

> df[rowSums(is.na(df)) > 0,]

  colA colB
3   NA    C

Update2, or to get only ColNames with information about NA (thanks to RSale for anyNA):

> lapply(df, anyNA)
$colA
[1] TRUE

$colB
[1] FALSE
  • Related