I have a data.frame
that assigns id
s to group
s. In the simplest scenario each id
is assigned to a different group
:
df1 <- data.frame(group = c("a1","a2"),
id = c("i1","i2"),
stringsAsFactors = F)
In a second scenario all id
s are assigned to one group
:
df2 <- data.frame(group = c("a1","a1"),
id = c("i1","i2"),
stringsAsFactors = F)
And in the third scenario there's ambiguous id
to group
assignment:
df3 <- data.frame(group = c("a1","a2","a2"),
id = c("i1","i1","i2"),
stringsAsFactors = F)
I'm looking for a function that would return a label "scenario1"/"scenario2"/"scenario3"
given such a data.frame
with the id
and group
columns, according to the scenarios above.
In other words, this function would return "scenario1"
for df1
, "scenario2"
for df2
, and "scenario3"
for df3
Obviously this can be done with if
statements but I'm hoping for something faster using dplyr
/tidyverse
or data.table
CodePudding user response:
Here's a function to check different conditions.
library(dplyr)
return_scenario <- function(df) {
tmp <- df %>% distinct(group, id)
case_when(
n_distinct(tmp$group) == 1 ~ 'scenario 2',
n_distinct(tmp$id) == nrow(tmp) ~ 'scenario 1',
TRUE ~ 'scenario 3')
}
return_scenario(df1)
#[1] "scenario 1"
return_scenario(df2)
#[1] "scenario 2"
return_scenario(df3)
#[1] "scenario 3"
If needed, this can also be translated in base R/data.table
with their equivalent functions.