Home > Software engineering >  In R, why is purrr's map2 ignoring my group_by argument when making plots?
In R, why is purrr's map2 ignoring my group_by argument when making plots?

Time:01-19

I have many plots that I need to create broken out by students' level. To do this, I want to use map2 from the purrr package. I want one plot for each level (so 4 plots altogether), but when I do group_by, the code creates one plot for each student (16 unique plots). *How do I get my code to make one plot for each grade rather than one for each student?

#My data
library(dplyr)
my_data <- tibble(level = c(rep(c("Kindergarten", "1st", "2nd", "3rd"), 4)),
                  id = c(1:16),
                  score = c(81:96))
#My attempt at making one plot per level--makes 16 plots instead of 4

library(purrr)
library(stringr)
library(ggplot2)

#Extract information
levels <- my_data %>% pull(level) %>% as.character
scores <- my_data %>% pull(score)

#Make plots
my_plots <- map2(
  .x = levels,
  .y = scores,
  .f = ~{
    my_data %>%
      group_by(level) %>% # I don't know why this is being ignored
      ggplot(aes(x = .x, y = .y))  
      geom_point()  
      ggtitle(str_glue("Score by {.x}"))
  }
)

my_plots #has 16 plots (one for each data point) instead of 4 (one for each level with each respective student represented)

CodePudding user response:

Please check the below code, i updated the code to use only map function with levels which we can use to filter the my_data

code

library(purrr)
levels <- my_data$level %>% unique()

#Make plots
my_plots <- map(
  .x = levels,
  .f = ~{
    my_data %>%
      filter(level == .x) %>% # I don't know why this is being ignored
      ggplot(aes(x = id, y = score))  
      geom_point()  
      ggtitle(str_glue("Score by {.x}"))
  }
)

my_plots #has 16 plots (one for each data point) instead of 4 (one for each level with each respective student represented)

output

enter image description here

enter image description here

CodePudding user response:

May I suggest a slightly different approach, which saves you a bit of coding (map instead of map2, less variables to be defined, and referencing becomes clearer).

First make a list by splitting your data frame, then create a plot for each.

library(tidyverse)
my_data <- tibble(level = c(rep(c("Kindergarten", "1st", "2nd", "3rd"), 4)),
                  id = c(1:16),
                  score = c(81:96))

my_data %>% 
  group_split(level) %>%
  ## or, which I prefer, with split (it makes a named list)
  # split(.$level) %>%
  map({
    ~ggplot(.x, aes(level, score))  
  geom_point()  
  ggtitle(str_glue("Score by {unique(.x$level)}"))
  }) %>%
  ## just for demo purpose
  patchwork::wrap_plots()

Created on 2023-01-18 with reprex v2.0.2

CodePudding user response:

Depending on requirements I would rather use facets :

library(dplyr)
library(purrr)
library(ggplot2)
library(forcats)
my_data <- tibble(level = c(rep(c("Kindergarten", "1st", "2nd", "3rd"), 4)),
                  id = c(1:16),
                  score = c(81:96))

# fix level order
my_data$level <- fct_inorder(my_data$level)

# Single plot with facets by levels:
my_data %>% 
  ggplot(aes(x = id, y = score))  
  geom_point()  
  facet_wrap(vars(level))

For separate plots, splitting the dataframe itself seems more natural. split() creates list of data frames named by the factor value used for splitting (i.e level values in this case). Used imap() here to access list names through .y,


# Separte plots by levels
library(patchwork)
plots <- imap(split(my_data, my_data$level),
              ~ ggplot(.x, aes(x = id, y = score))  
                geom_point()  
                ggtitle(.y)) 

# plots patched together to save some space
(plots[[1]] | plots[[2]]) / (plots[[3]] | plots[[4]])

Created on 2023-01-18 with reprex v2.0.2

  • Related