I have a named list of data frames that all contain the same columns, but for some of these data frames some of these columns are empty. What Im hoping to return is the name of the data frame in the list, and the name(s) of the empty column.
The repex below mirrors the process I am using on the full problem
library(tidyverse)
data("diamonds")
data1 <- diamonds
data1$color <- NA
data1$price <- NA
data2 <- diamonds
data2$carat <- NA
data1$Type <- "data1"
data2$Type <- "data2"
data1%>%
bind_rows(data2) -> dataFull
dataSplit <- split(dataFull, f = dataFull$Type)
for(i in dataSplit){
which(sapply(dataSplit[[i]], function(x) all(is.na(x))))
}
My hope is to return something like
data1: price, color
data2: carat
I've tried the very basic for-loop included above, which are admittedly not my strong suit.
CodePudding user response:
Your sapply
idea was right, but you need to subset the names of each data frame with the output. Also, since you are loading the tidyverse, you may as well use map
instead of a loop for brevity:
map(dataSplit, ~ names(.x)[sapply(.x, \(x) all(is.na(x)))])
#> $data1
#> [1] "color" "price"
#>
#> $data2
#> [1] "carat"
CodePudding user response:
library(tidyverse)
data("diamonds")
data1 <- diamonds
data1$color <- NA
data1$price <- NA
data2 <- diamonds
data2$carat <- NA
data1$Type <- "data1"
data2$Type <- "data2"
data1%>%
bind_rows(data2) -> dataFull
dataSplit <- split(dataFull, f = dataFull$Type)
lapply(dataSplit, function(x) {
cn <- colnames(x)
isempty <- apply(x, 2, function(col) is.na(col) |> all())
cn[ isempty ]
})
$data1
[1] "color" "price"
$data2
[1] "carat"
CodePudding user response:
Using select
library(dplyr)
library(purrr)
map(dataSplit, ~ .x %>%
select(where(~ all(is.na(.x)))) %>%
names)
$data1
[1] "color" "price"
$data2
[1] "carat"
Or in base R
lapply(dataSplit, \(x) names(x)[!colSums(!is.na(x))])
$data1
[1] "color" "price"
$data2
[1] "carat"