I am trying to understand how to properly format a combination of lapply, rbind and do.call in a statement and can't get the statement to run properly. I have supplied a simple example function and data that I'm using to try to understand the formatting with. I fully understand that the scenario I've supplied could be ran using a simpler method, the purpose of this is to simply understand the formatting and how to use lapply and rbind on a custom function.
Here's some test data:
facility_id patient_number test_result
123 1000 25
123 1000 30
25 1001 12
25 1002 67
25 1010 75
65 1009 8
22 1222 95
22 1223 89
I'm essentially trying to subset the data inside a custom function using a list of facility id values and then want to bind each data table together that results from the custom function.
Here's the code I've used:
facilities_id_list<-c(123, 25)
facility_counts<-function(facilities_id_list){
facility<-facilities_id_list[[i]]
subset<-data[facility_id==facility]
}
results <- do.call("rbind", lapply(seq_along(facilities_id_list), function(i) facility_counts)
The result I'm hoping to achieve:
facility_id patient_number test_result
123 1000 25
123 1000 30
25 1001 12
25 1002 67
25 1010 75
Why does this not work? Do I need to change the formatting?
CodePudding user response:
Instead of using ==
, use %in%
for direct subsetting
subset(data, facility_id %in% facilities_id_list)
In the OP's code, there are multiple issues - 1) the input argument is facilities_id_list
where as in lapply
, we are looping over the sequence i
., 2) facility_id==facility
should be data$facility_id==facility
as we are using [
and there is no data binding, 3) We need to specify that we are subsetting with row index as by default without any ,
, it is taken as column index in data.frame
facility_counts<-function(i){
facility<-facilities_id_list[[i]]
data[data$facility_id == facility, ]
}
> do.call(rbind, lapply(seq_along(facilities_id_list), facility_counts))
facility_id patient_number test_result
1 123 1000 25
2 123 1000 30
3 25 1001 12
4 25 1002 67
5 25 1010 75
CodePudding user response:
Here's an example of using plain old filter
, and then another option of using a custom function with do.call()
:
library(dplyr)
# data
df <- tibble::tribble(
~facility_id, ~patient_number, ~test_result,
123L, 1000L, 25L,
123L, 1000L, 30L,
25L, 1001L, 12L,
25L, 1002L, 67L,
25L, 1010L, 75L,
65L, 1009L, 8L,
22L, 1222L, 95L,
22L, 1223L, 89L
)
facilities_id_list<-c(123, 25)
# simplest solution: just using filter
df %>%
filter(facility_id %in% facilities_id_list)
#> # A tibble: 5 × 3
#> facility_id patient_number test_result
#> <int> <int> <int>
#> 1 123 1000 25
#> 2 123 1000 30
#> 3 25 1001 12
#> 4 25 1002 67
#> 5 25 1010 75
# using custom function do.call
custom_filter <- function(data) {
data %>%
filter(facility_id %in% facilities_id_list)
}
do.call(custom_filter, list(df))
#> # A tibble: 5 × 3
#> facility_id patient_number test_result
#> <int> <int> <int>
#> 1 123 1000 25
#> 2 123 1000 30
#> 3 25 1001 12
#> 4 25 1002 67
#> 5 25 1010 75