R check if NA is found in any of the column and perform the function-CodePudding

I have an dataset in csv and it contains 10 thousand rows. I will show you the top 6 rows including the headers. I have listed down the requirements below.

I want to perform checking for ever columns header, Name, Sex, Age and Birth on how many NA values are there.
If there is NA values, I will create a dataframe and load the column values to the dataframe including the NA values.
If there is no NA values, I will create a dataframe and load the column values to the dataframe without the NULL values.
I need to create 4 dataframe to store each columns as there are 4 columns.

Here is how the data look like

Name    Sex    Age    Birth
James   M      20     Africa
Lim     F      NA     London
NA      M      25     Australia
Alice   NA     27     Britain
Brown   F      29     USA

I have listed down the R code below.

headers <- colenames(data) #Print the headers of the dataframe to the header array
for (i in headers) #Loop through the headers array
{
    print(sum(is.na(data$i))) #Perform checking the sum of NA values under the specific columns
    if (sum(is.na(data$i)) > 0) #If sum of na is more than 0 meaning there are NA found

        table <- table(data[i], useNA = "always") #load the data to the dataframe
    else 
        table <- table(data[i]) #load the data to the dataframe
}

CodePudding user response：

I think you need useNA = "ifany" (see ?table for the available options) instead of a conditional, and you can simplify your code into one expression:

lapply(dat, table, useNA = "ifany")
# $Name
# Alice Brown James   Lim  <NA> 
#     1     1     1     1     1 
# $Sex
#    F    M <NA> 
#    2    2    1 
# $Age
#   20   25   27   29 <NA> 
#    1    1    1    1    1 
# $Birth
#    Africa Australia   Britain    London       USA 
#         1         1         1         1         1

Data

dat <- structure(list(Name = c("James", "Lim", NA, "Alice", "Brown"), Sex = c("M", "F", "M", NA, "F"), Age = c(20L, NA, 25L, 27L, 29L), Birth = c("Africa", "London", "Australia", "Britain", "USA")), class = "data.frame", row.names = c(NA, -5L))