Each ID
records a series of signal label: "alpha"
, "beta"
and "unknown"
.
If an ID has only two labels. Then I wish to assign the dominating label to all i.e. if the recorded labels of an ID is
c("alpha", "alpha", "unknown")
, it becomes c("alpha", "alpha", "alpha")
Can someone please help me with this.
library(tidyverse)
# Data preparation (you can directly work with the tbl below)
ID <- c(rep("A", 14), rep("B", 14), rep("C", 10), rep("D", 22), rep("E", 2))
series <- c(11, 3, 12, 2, 8, 2, 11, 8, 3, 2)
label <- unlist(
sapply(series, function(x) {case_when(x < 5 ~ rep("unknown", x),
x >= 5 ~ case_when(x > 10 ~ rep("alpha", x),
x <= 10 ~ rep("beta", x)) )
}))
# tbl
tbl <- tibble(ID = ID,
label = label)
CodePudding user response:
If I understood it correctly, from this
tbl %>% group_by(ID) %>% summarise(n_distinct(label))
1 A 2
2 B 2
3 C 2
4 D 3
5 E 1
We want to update labels for IDs A
, B
and C
but not D
or E
. We can make use of the table function to get the most occurring within those IDS.
tbl2 <- tbl %>%
group_by(ID) %>%
mutate(label = if(n_distinct(label) == 2) names(which.max(table(label))) else label)
Which now gives the number of distinct labels per ID
tbl2 %>% group_by(ID) %>% summarise(n_distinct(label))
ID `n_distinct(label)`
<chr> <int>
1 A 1
2 B 1
3 C 1
4 D 3
5 E 1