I have an dataset in csv and it contains 10 thousand rows. I will show you the top 6 rows including the headers. I have listed down the requirements below.
- I want to perform checking for ever columns header, Name, Sex, Age and Birth on how many NA values are there.
- If there is NA values, I will create a dataframe and load the column values to the dataframe including the NA values.
- If there is no NA values, I will create a dataframe and load the column values to the dataframe without the NULL values.
- I need to create 4 dataframe to store each columns as there are 4 columns.
Here is how the data look like
Name Sex Age Birth
James M 20 Africa
Lim F NA London
NA M 25 Australia
Alice NA 27 Britain
Brown F 29 USA
I have listed down the R code below.
headers <- colenames(data) #Print the headers of the dataframe to the header array
for (i in headers) #Loop through the headers array
{
print(sum(is.na(data$i))) #Perform checking the sum of NA values under the specific columns
if (sum(is.na(data$i)) > 0) #If sum of na is more than 0 meaning there are NA found
table <- table(data[i], useNA = "always") #load the data to the dataframe
else
table <- table(data[i]) #load the data to the dataframe
}
CodePudding user response:
I think you need useNA = "ifany"
(see ?table
for the available options) instead of a conditional, and you can simplify your code into one expression:
lapply(dat, table, useNA = "ifany")
# $Name
# Alice Brown James Lim <NA>
# 1 1 1 1 1
# $Sex
# F M <NA>
# 2 2 1
# $Age
# 20 25 27 29 <NA>
# 1 1 1 1 1
# $Birth
# Africa Australia Britain London USA
# 1 1 1 1 1
Data
dat <- structure(list(Name = c("James", "Lim", NA, "Alice", "Brown"), Sex = c("M", "F", "M", NA, "F"), Age = c(20L, NA, 25L, 27L, 29L), Birth = c("Africa", "London", "Australia", "Britain", "USA")), class = "data.frame", row.names = c(NA, -5L))