Home > database >  Creating a function with multiple arguments that subsets a dataframe [R]
Creating a function with multiple arguments that subsets a dataframe [R]

Time:04-19

I have a data frame named titanic with 2021 rows of passengers on the titanic and specific characteristics of each passenger:

Class  Sex   Age Survived
1   3rd Male Child       No
2   3rd Male Child       No
3   3rd Male Child       No
4   3rd Male Child       No
5   3rd Male Child       No
6   3rd Male Child       No
...

I want to create a function that has multiple arguments that looks something like this:

f1 <- function(sex, age, class, survived){
...
}

where the arguments are where I input some criteria of the passengers. As an example, I want to be able to input criteria into the function such that

f1("Female", "Child","3rd", "Yes")

returns

     Class    Sex   Age Survived
1534   3rd Female Child      Yes
1535   3rd Female Child      Yes
1536   3rd Female Child      Yes
1537   3rd Female Child      Yes
1538   3rd Female Child      Yes

Now, I have hard-coded it and just used an if else statement to cover all of the possibilities.

function.q6.1 <- function(sex,age,class,survival){
  if(sex == "Male" & age == "Child" & class == "3rd" & survival == "No"){
    subset(titanic, Sex == "Male" & Age == "Child" & Class == "3rd" & Survived == "No")
  }
  else if(sex == "Female" & age == "Child" & class == "3rd" & survival == "No"){
    subset(titanic, Sex == "Female" & Age == "Child" & Class == "3rd" & Survived == "No")
  }
  else if(sex == "Male" & age == "Adult" & class == "3rd" & survival == "No"){
    subset(titanic, Sex == "Male" & Age == "Adult" & Class == "3rd" & Survived == "No")
  }
...
}

I want to know if there is a more efficient way of doing this. Thank you ahead of time.

CodePudding user response:

If you are using a data.frame like shown in your question, you could use

library(dplyr)
my_filter <- function(sex, age, class, survived) {

  df %>% 
    filter(Sex == sex, Age == age, Class == class, Survived == survived)

}

Now my_filter("Female", "Child","3rd", "Yes") returns

   Class    Sex   Age Survived
7    3rd Female Child      Yes
8    3rd Female Child      Yes
9    3rd Female Child      Yes
10   3rd Female Child      Yes
11   3rd Female Child      Yes 

CodePudding user response:

#toy dataset
set.seed(1912)
titanic <- data.frame(class = sample(c("1st","2nd","3rd"),100,replace = T),
                      sex = sample(c("Male","Female"),100,replace = T),
                      age = sample(c("Child","Adult"),100,replace = T),
                      survival = sample(c("Yes","No"),100,replace = T)
                      )

f1 <- function(sex,age,class,survival) {
  titanic[titanic$class==class&titanic$sex==sex&titanic$age==age&titanic$survival==survival,]
}

f1("Female", "Child","3rd", "Yes")

class    sex   age survival
11   3rd Female Child      Yes
15   3rd Female Child      Yes
38   3rd Female Child      Yes
71   3rd Female Child      Yes
85   3rd Female Child      Yes
94   3rd Female Child      Yes

CodePudding user response:

This assumes that the first argument is the data frame and the remaining arguments are values for each of the columns in the order that they appear in the data frame.

The mapply compares successive columns to successive argument values returning a logical matrix. The apply returns one logical value per row and then we subscript by that.

We use the data frame shown reproducibly in the Note at the end in the test call.

f1 <- function(dat, ...) {
  dat <- na.omit(dat)
  dat[apply(mapply(`==`, dat, list(...)), 1, all), ]
}

f1(dat, "3rd", "Male", "Child", "No")
##   Class  Sex   Age Survived
## 1   3rd Male Child       No
## 2   3rd Male Child       No
## 3   3rd Male Child       No
## 4   3rd Male Child       No
## 5   3rd Male Child       No
## 6   3rd Male Child       No

Note

Lines <- "
Class  Sex   Age Survived
1   3rd Male Child       No
2   3rd Male Child       No
3   3rd Male Child       No
4   3rd Male Child       No
5   3rd Male Child       No
6   3rd Male Child       No"
dat <- read.table(text = Lines)

CodePudding user response:

Maybe another strategy could be:

library(dplyr)
library(stringr)

f1 <- paste(f1, collapse = "|")

my_function <- function(df){
  df %>% 
    select(Sex, Age, Class, Survived) %>% 
    filter(if_all(everything(), ~str_detect(.,f1))
    )
  }

my_function(df)

output:

       Sex   Age Class Survived
1534 Female Child   3rd      Yes
1535 Female Child   3rd      Yes
1536 Female Child   3rd      Yes
1537 Female Child   3rd      Yes
1538 Female Child   3rd      Yes
  • Related