Is there a way to break this into two steps so that the ml_logistic_regression() can be applied separately to flights_pipeline?
Below is working code for the pipeline:
flights_pipeline <- ml_pipeline(sc) %>%
ft_dplyr_transformer(
tbl = df
) %>%
ft_binarizer(
input_col = "dep_delay",
output_col = "delayed",
threshold = 15
) %>%
ft_bucketizer(
input_col = "sched_dep_time",
output_col = "hours",
splits = c(400, 800, 1200, 1600, 2000, 2400)
) %>%
ft_r_formula(delayed ~ month day hours distance) %>%
ml_logistic_regression()
This is my attempt, I'd like to break it into two steps - something like this:
flights_pipeline <- ml_pipeline(sc) %>%
ft_dplyr_transformer(
tbl = df
) %>%
ft_binarizer(
input_col = "dep_delay",
output_col = "delayed",
threshold = 15
) %>%
ft_bucketizer(
input_col = "sched_dep_time",
output_col = "hours",
splits = c(400, 800, 1200, 1600, 2000, 2400)
) %>%
ft_r_formula(delayed ~ month day hours distance)
flights_pipeline_with_model <- flights_pipeline %>%
ml_logistic_regression()
CodePudding user response:
Not clear based on the description and the OP's second code block. If the intention is to create an object within the pipe and continue with the pipe, perhaps pipeR
could help
library(pipeR)
ml_pipeline(sc) %>%
ft_dplyr_transformer(
tbl = df
) %>%
ft_binarizer(
input_col = "dep_delay",
output_col = "delayed",
threshold = 15
) %>%
ft_bucketizer(
input_col = "sched_dep_time",
output_col = "hours",
splits = c(400, 800, 1200, 1600, 2000, 2400)
) %>%
ft_r_formula(delayed ~ month day hours distance) %>%
(~flights_pipeline) %>>%
ml_logistic_regression()