I'm python user learning R.
Frequently, I need to check if columns of a dataframe contain NaN(s).
In python, I can simply do
import pandas as pd
df = pd.DataFrame({'colA': [1, 2, None, 3],
'colB': ['A', 'B', 'C', 'D']})
df.isna().any()
giving me
colA True
colB False
dtype: bool
In R I'm struggling to find an easy solution. People refer to some apply-like methods but that seems overly complex for such a primitive task. The closest solution I've found is this:
library(tidyverse)
df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))
!complete.cases(t(df))
giving
[1] TRUE FALSE
That's OKyish but I don't see the column names. If the dataframe has 50 columns I don't know which one has NaNs.
Is there a better R solution?
CodePudding user response:
You can use anyNA: Checks for NA in a vector
df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))
sapply(df, anyNA)
colA colB
TRUE FALSE
Edit
jay.sf is right. This will check for NaNs.
df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))
anyNAN <- function(x) {
any(is.nan(x))
}
sapply(df, anyNAN)
CodePudding user response:
The best waty to check if columns have NAs is to apply a loop to the columns with a function to check whether there is any(is.na)
.
lapply(df, function(x) any(is.na(x)))
$colA
[1] TRUE
$colB
[1] FALSE
I can see you load the tidyverse yet did not use it in your example. If we want to do this within the tidyverse, we can use purrr:
library(purrr)
df %>% map(~any(is.na(.x)))
Or with dplyr:
library(dplyr)
df %>% summarise(across(everything(), ~any(is.na(.x))
colA colB
1 TRUE FALSE
CodePudding user response:
The easiest way would be:
df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))
is.na(df)
Output:
colA colB
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,] TRUE FALSE
[4,] FALSE FALSE
Update, if you only want to see the rows containing NA:
> df[rowSums(is.na(df)) > 0,]
colA colB
3 NA C
Update2, or to get only ColNames with information about NA (thanks to RSale for anyNA
):
> lapply(df, anyNA)
$colA
[1] TRUE
$colB
[1] FALSE