I have a dataframe like this
id <- c(100,101,102,103,104,105,106,107,108,109,110)
state_code <- c("CA","CA","CA","CA","CA","CA","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
I am trying to use several filters with different cases of dataframe inputs.
Here are the conditions that I am working with
- If total rows of the whole dataframe < 5, print "Not enough ids"
Example:
id <- c(100,101,102,103)
state_code <- c("CA","CA","TX","CA")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
Desired output
"Not enough ids"
- If total rows >=5 and if the rows of any individual states in
state_code >=5
, then create a columnType = state_code
or elseType = "combined"
Example:
id <- c(100,101,102,103,104,105,106,107,108,109,110,111,112,113,114)
state_code <- c("CA","CA","CA","CA","CA","CA","TX","TX","TX","TX","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
Desired Output
id state_code Type
100 CA CA
101 CA CA
102 CA CA
103 CA CA
104 CA CA
105 CA CA
106 TX TX
107 TX TX
108 TX TX
109 TX TX
110 TX TX
111 TX TX
112 AZ Combined
113 MN Combined
114 CO Combined
- If total rows >=5 and if the rows of any individual states in
state_code are not >=5
, then create a columnType = "combined"
for all values
Example:
id <- c(100,101,102,103,104,105,106,107,108,109,110)
state_code <- c("CA","CA","CA","CA","TX","TX","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
Desired Output
id state_code Type
100 CA Combined
101 CA Combined
102 CA Combined
103 CA Combined
104 TX Combined
105 TX Combined
106 TX Combined
107 TX Combined
108 AZ Combined
109 MN Combined
110 CO Combined
I am trying to do it this way for 1st case but not able to do so for others
if(nrow(df.sample < 5){
cat("Not enough ids")
}
How do I wrap all this logic into a single code? Can someone point me in the right direction?
CodePudding user response:
Will this work:
library(dplyr)
rowscount <- function(df, id_col){
if(nrow(df) < 5)
return('Not enough ids')
else{
op_df = df %>% group_by({{id_col}}) %>% mutate(Type = if_else(n() >= 5, 'state_code', 'combined'))
return(op_df)
}
}
rowscount(df.sample, state_code)
# A tibble: 11 x 3
# Groups: state_code [5]
id state_code Type
<dbl> <chr> <chr>
1 100 CA state_code
2 101 CA state_code
3 102 CA state_code
4 103 CA state_code
5 104 CA state_code
6 105 CA state_code
7 106 TX combined
8 107 TX combined
9 108 AZ combined
10 109 MN combined
11 110 CO combined
id <- c(100,101,102,103)
state_code <- c("CA","CA","TX","CA")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
rowscount(df.sample, state_code)
[1] "Not enough ids"
CodePudding user response:
Condition 2 and 3 are the same so can be combined together. Try this function.
library(dplyr)
foo <- function(data){
if(nrow(data) < 5 ) {
return("Not enough ids")
} else {
data %>%
group_by(state_code) %>%
mutate(Type = case_when(n() < 5 ~ 'Combined',
TRUE ~state_code)) %>%
ungroup
}
}