I have a dataframe like this
id <- c(5738180,51845,167774,517814,1344920,517833,51844)
state_code <- c("AZ","CA","AZ","WA","MO","CA","AZ")
state_rank <- c(1,2,1,3,4,2,1)
df.sample <- data.frame(id,state_code,state_rank, stringsAsFactors=FALSE)
df.sample
id state_code state_rank
5738180 AZ 1
51845 CA 2
167774 AZ 1
517814 WA 3
1344920 MO 4
517833 CA 2
51844 AZ 1
I am trying to create a function that takes a df
and state
as inputs and returns a df
based on the filtered state
The state
variable should be able to take both state_code
and state_rank
as inputs
Desired outputs
If I pass in
state = "AZ", return rows filtered for state_code = "AZ"
id state_code state_rank
5738180 AZ 1
167774 AZ 1
51844 AZ 1
state = "WA,MO", return rows filtered for state_code = c("WA","MO")
id state_code state_rank
517814 WA 3
1344920 MO 4
state = 2, return top 2 ranked states state_rank <= 2
id state_code state_rank
5738180 AZ 1
51845 CA 2
167774 AZ 1
517833 CA 2
51844 AZ 1
I am trying to do it this way but not getting what I wanted
func <- function(df, state){
df %>% filter(state_code == state)
}
func(df.sample,state = c("AZ"))
I'd really appreciate it if someone can point me in the right direction.
CodePudding user response:
I find polymorphic arguments to be cool at times, but they have bitten me too many times to be really useful, especially when the two types/variables have no real connection. I suggest being explicit about the variables,
func <- function(x, codes, rank) {
if (!missing(codes)) {
codes <- unlist(strsplit(codes, ",", fixed = TRUE), use.names = FALSE)
x <- subset(x, state_code %in% codes)
}
if (!missing(rank)) {
x <- subset(x, state_rank <= rank)
}
x
}
func(df.sample, codes="WA,MO") # since your example included the literal "WA,MO"
# id state_code state_rank
# 4 517814 WA 3
# 5 1344920 MO 4
func(df.sample, codes=c("WA","MO"))
# id state_code state_rank
# 4 517814 WA 3
# 5 1344920 MO 4
func(df.sample, rank=2)
# id state_code state_rank
# 1 5738180 AZ 1
# 2 51845 CA 2
# 3 167774 AZ 1
# 6 517833 CA 2
# 7 51844 AZ 1
CodePudding user response:
You could just tunnel your filter condition directly to dplyr::filter
using {{}}
:
library(dplyr)
f <- function(df, cond){
df %>%
filter({{ cond }})
}
Output
f(df.sample, state_rank <= 2)
id state_code state_rank
1 5738180 AZ 1
2 51845 CA 2
3 167774 AZ 1
4 517833 CA 2
5 51844 AZ 1
f(df.sample, state_code %in% c("WA", "MO"))
id state_code state_rank
1 517814 WA 3
2 1344920 MO 4
CodePudding user response:
You could have two different input arguments, one for state
and another for state_rank
, but if you'd like to keep it as a single argument, you'll need an if-statement to differentiate whether you are filtering by state
or state_rank
.
func <- function(df, state){
if(is.numeric(state) {
df %>% filter(state_rank >= state)
} else {
df %>% filter(state_code %in% state)
}
}
You may want to handle things like "what if you pass state=NA
", but hopefully this helps you get started.
CodePudding user response:
Using subset
f1 <- function(dat, state_codes) {
subset(dat, state_code %in% state_codes)
}
-testing
> f1(df.sample, c("AZ"))
id state_code state_rank
1 5738180 AZ 1
3 167774 AZ 1
7 51844 AZ 1
> f1(df.sample, c("WA", "MO"))
id state_code state_rank
4 517814 WA 3
5 1344920 MO 4