Home > Software engineering >  Filter a dataframe with "state" that can take both characters and numeric as input to colu
Filter a dataframe with "state" that can take both characters and numeric as input to colu

Time:10-08

I have a dataframe like this

        id <- c(5738180,51845,167774,517814,1344920,517833,51844)
        state_code <- c("AZ","CA","AZ","WA","MO","CA","AZ")
        state_rank <- c(1,2,1,3,4,2,1)
        df.sample <- data.frame(id,state_code,state_rank, stringsAsFactors=FALSE) 
    

df.sample

       id state_code state_rank
  5738180         AZ          1
    51845         CA          2
   167774         AZ          1
   517814         WA          3
  1344920         MO          4
   517833         CA          2
    51844         AZ          1

I am trying to create a function that takes a df and state as inputs and returns a df based on the filtered state

The state variable should be able to take both state_code and state_rank as inputs

Desired outputs

If I pass in

state = "AZ", return rows filtered for state_code = "AZ"

       id state_code state_rank
  5738180         AZ          1
   167774         AZ          1
    51844         AZ          1

state = "WA,MO", return rows filtered for state_code = c("WA","MO")

       id state_code state_rank
   517814         WA          3
  1344920         MO          4

state = 2, return top 2 ranked states state_rank <= 2

       id state_code state_rank
  5738180         AZ          1
    51845         CA          2
   167774         AZ          1
   517833         CA          2
    51844         AZ          1

I am trying to do it this way but not getting what I wanted

func <- function(df, state){
    df %>% filter(state_code == state)
}

func(df.sample,state = c("AZ"))

I'd really appreciate it if someone can point me in the right direction.

CodePudding user response:

I find polymorphic arguments to be cool at times, but they have bitten me too many times to be really useful, especially when the two types/variables have no real connection. I suggest being explicit about the variables,

func <- function(x, codes, rank) {
  if (!missing(codes)) {
    codes <- unlist(strsplit(codes, ",", fixed = TRUE), use.names = FALSE)
    x <- subset(x, state_code %in% codes)
  }
  if (!missing(rank)) {
    x <- subset(x, state_rank <= rank)
  }
  x
}

func(df.sample, codes="WA,MO")   # since your example included the literal "WA,MO"
#        id state_code state_rank
# 4  517814         WA          3
# 5 1344920         MO          4
func(df.sample, codes=c("WA","MO"))
#        id state_code state_rank
# 4  517814         WA          3
# 5 1344920         MO          4

func(df.sample, rank=2)
#        id state_code state_rank
# 1 5738180         AZ          1
# 2   51845         CA          2
# 3  167774         AZ          1
# 6  517833         CA          2
# 7   51844         AZ          1

CodePudding user response:

You could just tunnel your filter condition directly to dplyr::filter using {{}}:

library(dplyr)

f <- function(df, cond){
  df %>% 
    filter({{ cond }})
}

Output

f(df.sample, state_rank <= 2)
       id state_code state_rank
1 5738180         AZ          1
2   51845         CA          2
3  167774         AZ          1
4  517833         CA          2
5   51844         AZ          1
f(df.sample, state_code %in% c("WA", "MO"))
       id state_code state_rank
1  517814         WA          3
2 1344920         MO          4

CodePudding user response:

You could have two different input arguments, one for state and another for state_rank, but if you'd like to keep it as a single argument, you'll need an if-statement to differentiate whether you are filtering by state or state_rank.

func <- function(df, state){
    if(is.numeric(state) {
        df %>% filter(state_rank >= state)
    } else {
        df %>% filter(state_code %in% state)
    }
}

You may want to handle things like "what if you pass state=NA", but hopefully this helps you get started.

CodePudding user response:

Using subset

 f1 <- function(dat, state_codes) {
      subset(dat, state_code %in% state_codes)
}

-testing

> f1(df.sample, c("AZ"))
       id state_code state_rank
1 5738180         AZ          1
3  167774         AZ          1
7   51844         AZ          1
> f1(df.sample, c("WA", "MO"))
       id state_code state_rank
4  517814         WA          3
5 1344920         MO          4
  • Related