Home > Blockchain >  format of do.call statement
format of do.call statement

Time:01-11

I am trying to understand how to properly format a combination of lapply, rbind and do.call in a statement and can't get the statement to run properly. I have supplied a simple example function and data that I'm using to try to understand the formatting with. I fully understand that the scenario I've supplied could be ran using a simpler method, the purpose of this is to simply understand the formatting and how to use lapply and rbind on a custom function.

Here's some test data:

facility_id  patient_number  test_result
123          1000            25
123          1000            30
25           1001            12
25           1002            67
25           1010            75
65           1009            8
22           1222            95
22           1223            89

I'm essentially trying to subset the data inside a custom function using a list of facility id values and then want to bind each data table together that results from the custom function.

Here's the code I've used:

facilities_id_list<-c(123, 25)
facility_counts<-function(facilities_id_list){
  facility<-facilities_id_list[[i]]
  subset<-data[facility_id==facility]
}

results <- do.call("rbind", lapply(seq_along(facilities_id_list), function(i) facility_counts)

The result I'm hoping to achieve:

facility_id  patient_number  test_result
123          1000            25
123          1000            30
25           1001            12
25           1002            67
25           1010            75

Why does this not work? Do I need to change the formatting?

CodePudding user response:

Instead of using ==, use %in% for direct subsetting

subset(data, facility_id %in% facilities_id_list)

In the OP's code, there are multiple issues - 1) the input argument is facilities_id_list where as in lapply, we are looping over the sequence i., 2) facility_id==facility should be data$facility_id==facility as we are using [ and there is no data binding, 3) We need to specify that we are subsetting with row index as by default without any ,, it is taken as column index in data.frame

 facility_counts<-function(i){
  facility<-facilities_id_list[[i]]
  data[data$facility_id == facility, ]
}
> do.call(rbind, lapply(seq_along(facilities_id_list), facility_counts))
  facility_id patient_number test_result
1         123           1000          25
2         123           1000          30
3          25           1001          12
4          25           1002          67
5          25           1010          75

CodePudding user response:

Here's an example of using plain old filter, and then another option of using a custom function with do.call():

library(dplyr)


# data
df <- tibble::tribble(
  ~facility_id, ~patient_number, ~test_result,
  123L,           1000L,          25L,
  123L,           1000L,          30L,
  25L,           1001L,          12L,
  25L,           1002L,          67L,
  25L,           1010L,          75L,
  65L,           1009L,           8L,
  22L,           1222L,          95L,
  22L,           1223L,          89L
)

facilities_id_list<-c(123, 25)

# simplest solution: just using filter
df %>% 
  filter(facility_id %in% facilities_id_list)
#> # A tibble: 5 × 3
#>   facility_id patient_number test_result
#>         <int>          <int>       <int>
#> 1         123           1000          25
#> 2         123           1000          30
#> 3          25           1001          12
#> 4          25           1002          67
#> 5          25           1010          75

# using custom function   do.call
custom_filter <- function(data) {
  data %>% 
    filter(facility_id %in% facilities_id_list)
}

do.call(custom_filter, list(df))
#> # A tibble: 5 × 3
#>   facility_id patient_number test_result
#>         <int>          <int>       <int>
#> 1         123           1000          25
#> 2         123           1000          30
#> 3          25           1001          12
#> 4          25           1002          67
#> 5          25           1010          75
  • Related