Create a column Type based on several conditions with different dataframe inputs in R-CodePudding

I have a dataframe like this

id <- c(100,101,102,103,104,105,106,107,108,109,110)
state_code <- c("CA","CA","CA","CA","CA","CA","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)

I am trying to use several filters with different cases of dataframe inputs.

Here are the conditions that I am working with

If total rows of the whole dataframe < 5, print "Not enough ids"

Example:

id <- c(100,101,102,103)
state_code <- c("CA","CA","TX","CA")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)

Desired output

"Not enough ids"

If total rows >=5 and if the rows of any individual states in state_code >=5, then create a column Type = state_code or else Type = "combined"

Example:

id <- c(100,101,102,103,104,105,106,107,108,109,110,111,112,113,114)
state_code <- c("CA","CA","CA","CA","CA","CA","TX","TX","TX","TX","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)

Desired Output

    id state_code     Type
   100         CA       CA
   101         CA       CA
   102         CA       CA
   103         CA       CA
   104         CA       CA
   105         CA       CA
   106         TX       TX
   107         TX       TX
   108         TX       TX
   109         TX       TX
   110         TX       TX
   111         TX       TX
   112         AZ Combined
   113         MN Combined
   114         CO Combined

If total rows >=5 and if the rows of any individual states in state_code are not >=5, then create a column Type = "combined" for all values

Example:

id <- c(100,101,102,103,104,105,106,107,108,109,110)
state_code <- c("CA","CA","CA","CA","TX","TX","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)

Desired Output

    id state_code     Type
   100         CA Combined
   101         CA Combined
   102         CA Combined
   103         CA Combined
   104         TX Combined
   105         TX Combined
   106         TX Combined
   107         TX Combined
   108         AZ Combined
   109         MN Combined
   110         CO Combined

I am trying to do it this way for 1st case but not able to do so for others

if(nrow(df.sample < 5){
    cat("Not enough ids")
}

How do I wrap all this logic into a single code? Can someone point me in the right direction?

CodePudding user response：

Will this work:

library(dplyr)

rowscount <- function(df, id_col){
  if(nrow(df) < 5)
    return('Not enough ids')
  else{
    op_df = df %>% group_by({{id_col}}) %>% mutate(Type = if_else(n() >= 5, 'state_code', 'combined'))
  return(op_df)
  }
}
rowscount(df.sample, state_code)
# A tibble: 11 x 3
# Groups:   state_code [5]
      id state_code Type      
   <dbl> <chr>      <chr>     
 1   100 CA         state_code
 2   101 CA         state_code
 3   102 CA         state_code
 4   103 CA         state_code
 5   104 CA         state_code
 6   105 CA         state_code
 7   106 TX         combined  
 8   107 TX         combined  
 9   108 AZ         combined  
10   109 MN         combined  
11   110 CO         combined  

id <- c(100,101,102,103)
state_code <- c("CA","CA","TX","CA")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)

rowscount(df.sample, state_code)
[1] "Not enough ids"

CodePudding user response：

Condition 2 and 3 are the same so can be combined together. Try this function.

library(dplyr)

foo <- function(data){
  if(nrow(data) < 5 ) {
    return("Not enough ids")
  } else {
    data %>%
      group_by(state_code) %>%
      mutate(Type = case_when(n() < 5 ~ 'Combined', 
                              TRUE ~state_code)) %>%
      ungroup
  }
}