I'd like to have a function that first modifies the data.table (if certain criterias are fulfilled) and than aggregates some values.
df.flights <- flights
setDT(df.flights)
aggregate_flights <- function(x, test = FALSE) {
if (test == TRUE) {
df_flights_red <- df.flights[tailnum == "N9EAMQ" | tailnum == "N950UW" | tailnum == "N460WN"]
} else {
df_flights_red <- df.flights
}
y <- df_flights_red[, .(air_time = sum(air_time, na.rm = TRUE),
distance = sum(distance, na.rm = TRUE)),
by = .(month, x)]
return(y)
}
agg <- aggregate_flights(df_flights_red[[tailnum]], TRUE)
I always get the error message that object "df_flights_red" can't be found. It seems, that my call of the function isn't correct
agg <- aggregate_flights(df_flights_red[[tailnum]], TRUE)
How do I have to make this call?
CodePudding user response:
The error message makes sense since you don't have any object named df_flights_red
in your global environment. df_flights_red
is present inside the function which you cannot access from outside.
Provided I have understood you clearly here is what you can use
Pass data as first argument to the function. This is not needed but is a good practice.
Pass column name as string (
'tailnum'
)test == TRUE
is redundant, use onlytest
.A == 'a' | A == 'b' | A == 'c'
can be changed toA %in% c('a', 'b', 'c')
.
library(data.table)
df.flights <- nycflights13::flights
setDT(df.flights)
aggregate_flights <- function(data, x, test = FALSE) {
if (test) {
df_flights_red <- data[tailnum %in% c("N9EAMQ", "N950UW" ,"N460WN")]
} else {
df_flights_red <- data
}
y <- df_flights_red[, .(air_time = sum(air_time, na.rm = TRUE),
distance = sum(distance, na.rm = TRUE)),
by = c('month', x)]
return(y)
}
agg <- aggregate_flights(df.flights, 'tailnum', TRUE)
agg
# month tailnum air_time distance
# 1: 1 N9EAMQ 2543 15944
# 2: 1 N950UW 597 2608
# 3: 1 N460WN 471 3040
# 4: 10 N9EAMQ 1346 9040
# 5: 10 N460WN 569 3966
# 6: 10 N950UW 728 3404
# 7: 11 N950UW 481 1856
#...
#...