I need to create my own class object that takes a dataframe and has methods 'get_data' to choose dataframe, 'select' to select columns by their names and 'filter' to filter rows with certain values. Select and filter are a kind of similar to dplyr, but without using dplyr.
I would like they could be chained like this:
result <- df_object$get_data(df)$select(col1, col2, period)$filter(period)
What can I do so that 'filter' method would filter already selected values? Now it filters initial dataset. Also how to change methods so that select and filter wouldn't need data argument? Please give me some tips, I feel like I'm doing it a wrong way. Do I need to add some fields to class?
dataFrame <- R6Class("dataFrame",
list(data = "data.frame"),
public = list(
get_data = function(data) {data},
select_func = function(data, columns) {data[columns]},
filter_func = function(data, var) {data[var, ]}
))
# Create new object
df_object <- dataFrame$new()
# Call methods
df_object$get_data(df)
df_object$select_func(df, c("month", "forecast"))
df_object$filter_func(df[df$month %in% c(1, 2), ])
CodePudding user response:
If you want to chain member functions, you need those member functions to return self
. This means that the R6 object has to modify the data it contains. Since the benefit of R6 is to reduce copies, I would probably keep a full copy of the data, and have select_func
and filter_func
update some row and column indices:
library(R6)
dataFrame <- R6Class("dataFrame",
public = list(
data = data.frame(),
rows = 0,
columns = 0,
initialize = function(data) {
self$data <- data
self$rows <- seq(nrow(data))
self$columns <- seq_along(data)
},
get_data = function() {self$data[self$columns][self$rows,]},
select_func = function(cols) {
if(is.character(cols)) cols <- match(cols, names(self$data))
self$columns <- cols
self
},
filter_func = function(r) {
if(is.logical(r)) r <- which(r)
self$rows <- r
self
})
)
This allows us to chain the filter and select methods:
dataFrame$new(iris)$filter_func(1:5)$select_func(1:2)$get_data()
#> Sepal.Length Sepal.Width
#> 1 5.1 3.5
#> 2 4.9 3.0
#> 3 4.7 3.2
#> 4 4.6 3.1
#> 5 5.0 3.6
and our select method can take names too:
dataFrame$new(mtcars)$select_func(c("mpg", "wt"))$get_data()
#> mpg wt
#> Mazda RX4 21.0 2.620
#> Mazda RX4 Wag 21.0 2.875
#> Datsun 710 22.8 2.320
#> Hornet 4 Drive 21.4 3.215
#> Hornet Sportabout 18.7 3.440
#> Valiant 18.1 3.460
#> Duster 360 14.3 3.570
#> Merc 240D 24.4 3.190
#> Merc 230 22.8 3.150
#> Merc 280 19.2 3.440
#> Merc 280C 17.8 3.440
#> Merc 450SE 16.4 4.070
#> Merc 450SL 17.3 3.730
#> Merc 450SLC 15.2 3.780
#> Cadillac Fleetwood 10.4 5.250
#> Lincoln Continental 10.4 5.424
#> Chrysler Imperial 14.7 5.345
#> Fiat 128 32.4 2.200
#> Honda Civic 30.4 1.615
#> Toyota Corolla 33.9 1.835
#> Toyota Corona 21.5 2.465
#> Dodge Challenger 15.5 3.520
#> AMC Javelin 15.2 3.435
#> Camaro Z28 13.3 3.840
#> Pontiac Firebird 19.2 3.845
#> Fiat X1-9 27.3 1.935
#> Porsche 914-2 26.0 2.140
#> Lotus Europa 30.4 1.513
#> Ford Pantera L 15.8 3.170
#> Ferrari Dino 19.7 2.770
#> Maserati Bora 15.0 3.570
#> Volvo 142E 21.4 2.780
For completeness, you need some type safety, and I would also add a reset method to remove all filtering. This effectively gives you a data frame where the filtering and selecting are non-destructive, which could actually be very useful.
Created on 2022-05-01 by the reprex package (v2.0.1)