Home > other >  Using map_dbl with a nested df not accessing the dataframe correctly?
Using map_dbl with a nested df not accessing the dataframe correctly?

Time:10-21

I'm working on a project where I need to find the distance between a bunch of behaviors that are measured in 3-dimensional space and a pre-identified point in 3-dimensional space. I wrote a function to calculate the distance between the point and a single behavior, which works when I apply it to only one behavior. However, I need to apply it to ~750 behaviors in a larger data frame. So I am hoping to nest the larger behaviors data frame by term and then apply the function to each one of those nested dataframes using map_dbl. However, I keep getting the error:

Error: Problem with mutate() column distance. ℹ distance = map_dbl(data, calc_distance_from_beh). x Join columns must be present in data. x Problem with dim. ℹ The error occurred in row 1.

It seems like something is happening when map_dbl is being applied to the nested dataframes where it isn't able to access the "dim" column to join on and I'm not sure why.

I've included a reproducible example below with just two behaviors.

Reproducible example:

behaviors <- tibble(term = rep(c("abandon", "abet"), each = 3),
                   estimate = c(-3.31, -0.08, -0.11, 0.03, 0.34, -0.18),
                   dim = c("E", "P", "A", "E", "P", "A"))

optimal_behavior <- tibble(actor = "civil_engineer",
                          object = "civil_engineer",
                          opt_beh = c(1.905645, 0.9960085, -0.17772678),
                          dim = c("E", "P", "A"))


calc_distance_from_beh <- function(nested_df){
  
      optimal_behavior <- as_tibble(optimal_behavior)
      nested_df <- as_tibble(nested_df)
      
      df_for_calculations <- left_join(optimal_behavior, nested_df, by = "dim")
      
      df_for_calculations %>% 
            mutate(dist = (estimate-opt_beh)^2) %>% 
            summarise(total_dist = sum(dist)) %>% 
        pull()
}


behaviors_distance <- behaviors %>% 
                      nest_by(term) %>% 
                      mutate(distance = map_dbl(data, calc_distance_from_beh))

CodePudding user response:

If the 'value' column is named as estimate, just ungroup after the nest_by (because nest_by creates a rowwise attribute which prevents the map to access each element)

library(purrr)
library(dplyr)
behaviors %>% 
          nest_by(term) %>% 
          ungroup %>%
          mutate(distance = map_dbl(data, calc_distance_from_beh))
# A tibble: 2 × 3
  term                  data distance
  <chr>   <list<tibble[,2]>>    <dbl>
1 abandon            [3 × 2]    28.4 
2 abet               [3 × 2]     3.95

Or instead of map, we may directly apply the function in mutate as it is rowwise

behaviors %>%
    nest_by(term) %>%
    mutate(distance = calc_distance_from_beh(data)) %>%
    ungroup

-output

# A tibble: 2 × 3
  term                  data distance
  <chr>   <list<tibble[,2]>>    <dbl>
1 abandon            [3 × 2]    28.4 
2 abet               [3 × 2]     3.95
  • Related