Home > Back-end >  How do I use the pipe operator or something related to break a pipeline into two steps?
How do I use the pipe operator or something related to break a pipeline into two steps?

Time:02-11

Is there a way to break this into two steps so that the ml_logistic_regression() can be applied separately to flights_pipeline?

Below is working code for the pipeline:

flights_pipeline <- ml_pipeline(sc) %>%
  ft_dplyr_transformer(
    tbl = df
    ) %>%
  ft_binarizer(
    input_col = "dep_delay",
    output_col = "delayed",
    threshold = 15
  ) %>%
  ft_bucketizer(
    input_col = "sched_dep_time",
    output_col = "hours",
    splits = c(400, 800, 1200, 1600, 2000, 2400)
  )  %>%
  ft_r_formula(delayed ~ month   day   hours   distance) %>% 
  ml_logistic_regression()

This is my attempt, I'd like to break it into two steps - something like this:

flights_pipeline <- ml_pipeline(sc) %>%
  ft_dplyr_transformer(
    tbl = df
    ) %>%
  ft_binarizer(
    input_col = "dep_delay",
    output_col = "delayed",
    threshold = 15
  ) %>%
  ft_bucketizer(
    input_col = "sched_dep_time",
    output_col = "hours",
    splits = c(400, 800, 1200, 1600, 2000, 2400)
  )  %>%
  ft_r_formula(delayed ~ month   day   hours   distance)

flights_pipeline_with_model <- flights_pipeline %>% 
  ml_logistic_regression()

CodePudding user response:

Not clear based on the description and the OP's second code block. If the intention is to create an object within the pipe and continue with the pipe, perhaps pipeR could help

library(pipeR)
ml_pipeline(sc) %>%
  ft_dplyr_transformer(
    tbl = df
    ) %>%
  ft_binarizer(
    input_col = "dep_delay",
    output_col = "delayed",
    threshold = 15
  ) %>%
  ft_bucketizer(
    input_col = "sched_dep_time",
    output_col = "hours",
    splits = c(400, 800, 1200, 1600, 2000, 2400)
  )  %>%
  ft_r_formula(delayed ~ month   day   hours   distance) %>% 
  (~flights_pipeline) %>>%  
  ml_logistic_regression()
  • Related