Home > front end >  NA values in vectors passed to map_dfr
NA values in vectors passed to map_dfr

Time:09-23

I'm trying to pass vectors, each with a different number of NA values, through to a map() function but it's returning an error.

I have a tibble of N numeric columns and 1 categorical column. I want to compare the distributions for each of the numeric columns against the other split by the values of the categorical column. I use overlapping::overlap() to calculate the overlap of the distributions, and i feed the numeric columns into a map_dfr function for the iteration. For example:

require(overlapping)
require(dplyr)
require(purrr)

set.seed( 1 )
n <- 100
G1 <- sample( 0:30, size = n, replace = TRUE )
G2 <- sample( 0:30, size = n, replace = TRUE, prob = dbinom( 0:30, 31, .55 ))
G3 <- sample( 0:30, size = n, replace = TRUE, prob = dbinom( 0:30, 41, .65 ))
Data <- data.frame(y = G1, x = G2, z = G3, group = rep(c("G1","G2", "G3"), each = n), class = rep(c("C1","C2", "C3"), each = 1)) %>% as_tibble()
Data 

overlap_fcn <- function(.x) {
        ## construct list of vectors
    dist_list <- list(
                "C1" = Data %>% 
                        filter(class == 'C1', !is.na(.x)) %>% 
                        pull(.x), 
                "C2" = Data %>% 
                        filter(class == 'C2', !is.na(.x)) %>% 
                        pull(.x),
                "C3" = Data %>% 
                        filter(class == 'C3', !is.na(.x)) %>% 
                        pull(.x)
                )
## calculate distribution overlaps
    return(
        enframe(
                overlapping::overlap(dist_list)$OV*100
        ) %>% 
        mutate(value = paste0(round(value, 2), "%"),
                class = .x) %>%
        rename(comparison = name, overlap = value) %>%
        relocate(class)
    )

}

overlap_table <- purrr::map_dfr(
  .x = c('y', 'x', "z"),
  .f = ~overlap_fcn(.x))

overlap_table

The above works as intended. However, in practice I have different amounts of missingess in each of x, y, and z. I try to account for this with the filter on !is.na(.x) but it's not working. For example:

Data$x[1:3] <- NA
Data$y[10:20] <- NA
Data$z[100:150] <- NA

overlap_table <- purrr::map_dfr(
  .x = c('x', 'y', "z"),
  .f = ~overlap_fcn(.x))

returns this error:

Error in density.default(x[[j]], n = nbins, ...): 'x' contains missing values
Error in density.default(x[[j]], n = nbins, ...): 'x' contains missing values
Traceback:
1. purrr::map_dfr(.x = c("x", "y", "z"), .f = ~overlap_fcn(.x))
2. map(.x, .f, ...)
3. .f(.x[[i]], ...)
4. overlap_fcn(.x)
5. enframe(overlapping::overlap(dist_list)$OV * 100) %>% mutate(value = paste0(round(value, 
 .     2), "%"), class = .x) %>% rename(comparison = name, overlap = value) %>% 
 .     relocate(class)   # at line 25-33 of file <text>
6. relocate(., class)
7. rename(., comparison = name, overlap = value)
8. mutate(., value = paste0(round(value, 2), "%"), class = .x)
9. enframe(overlapping::overlap(dist_list)$OV * 100)
10. overlapping::overlap(dist_list)
11. density(x[[j]], n = nbins, ...)
12. density.default(x[[j]], n = nbins, ...)
13. stop("'x' contains missing values")

Can anyone help me out here please? I'm sure it's something super obvious i'm missing; i just can't see what!

CodePudding user response:

Here, the .x is character class. We may need to convert to symbol and evaluate (!!)

overlap_fcn <- function(.x) {
        ## construct list of vectors
    dist_list <- list(
                "C1" = Data %>% 
                        filter(class == 'C1', !is.na(!! rlang::sym(.x)))  %>% 
                        pull(.x), 
                "C2" = Data %>% 
                         filter(class == 'C2', !is.na(!! rlang::sym(.x))) %>% 
                        pull(.x),
                "C3" = Data %>% 
                        filter(class == 'C3', !is.na(!! rlang::sym(.x)))  %>% 
                        pull(.x)
                )
## calculate distribution overlaps
    return(
        enframe(
                overlapping::overlap(dist_list)$OV*100
        ) %>% 
        mutate(value = paste0(round(value, 2), "%"),
                class = .x) %>%
        rename(comparison = name, overlap = value) %>%
        relocate(class)
    )

}

-testing after creating the NAs in Data

> purrr::map_dfr(
    .x = c('x', 'y', "z"),
    .f = ~overlap_fcn(.x))
# A tibble: 9 × 3
  class comparison overlap
  <chr> <chr>      <chr>  
1 x     C1-C2      98.61% 
2 x     C1-C3      97.46% 
3 x     C2-C3      97.5%  
4 y     C1-C2      95.47% 
5 y     C1-C3      96.22% 
6 y     C2-C3      97.14% 
7 z     C1-C2      90.17% 
8 z     C1-C3      94.9%  
9 z     C2-C3      89.24% 
  • Related