I have the dataframe below and I want to create a new dataframe based on this with 3 columns. The first will be named "Field"
and will contain the column names of this dataframe (col1,col2,col3
). The second will be named "Absolute"
and will contain the absolute number of missing values of this column and the third will be named Percentage
and will contain the percentage of the missing values of this column. The number of columns and rows in my real dataframe is bigger.
col1<-c("as","df",NA)
col2<-c("ds",NA,NA)
col3<-c(NA,NA,NA)
df<-data.frame(col1,col2,col3)
CodePudding user response:
Try,
data.frame(field = names(df),
Absolute = colSums(is.na(df)),
Percentage = 100 * (colSums(is.na(df)) / nrow(df)),
row.names = seq(nrow(df)))
field Absolute Percentage
1 col1 1 33.33333
2 col2 2 66.66667
3 col3 3 100.00000
CodePudding user response:
Tidyverse approach (though I prefer @Sotos’ solution):
library(dplyr)
library(tidyr)
df %>%
summarize(across(
everything(),
list(
Absolute = ~ sum(is.na(.x)),
Percentage = ~ mean(is.na(.x)) * 100
)
)) %>%
pivot_longer(
everything(),
names_to = c("Field", ".value"),
names_sep = "_"
)
# A tibble: 3 × 3
Field Absolute Percentage
<chr> <int> <dbl>
1 col1 1 33.3
2 col2 2 66.7
3 col3 3 100