Home > front end >  Reassigning labels using dplyr
Reassigning labels using dplyr

Time:12-01

Each ID records a series of signal label: "alpha", "beta" and "unknown". If an ID has only two labels. Then I wish to assign the dominating label to all i.e. if the recorded labels of an ID is c("alpha", "alpha", "unknown"), it becomes c("alpha", "alpha", "alpha")

Can someone please help me with this.

library(tidyverse)

# Data preparation (you can directly work with the tbl below)
ID <- c(rep("A", 14), rep("B", 14), rep("C", 10), rep("D", 22), rep("E", 2))
series <- c(11, 3, 12, 2, 8, 2, 11, 8, 3, 2)

label <- unlist(
  sapply(series, function(x) {case_when(x < 5 ~ rep("unknown", x),
                                            x >= 5 ~ case_when(x > 10 ~ rep("alpha", x),
                                                               x <= 10 ~ rep("beta", x))                                                               )
  }))

# tbl
tbl <- tibble(ID = ID, 
              label = label)

CodePudding user response:

If I understood it correctly, from this

tbl %>% group_by(ID) %>% summarise(n_distinct(label))
1 A                       2
2 B                       2
3 C                       2
4 D                       3
5 E                       1

We want to update labels for IDs A, B and C but not D or E. We can make use of the table function to get the most occurring within those IDS.

tbl2 <- tbl %>%
  group_by(ID) %>%
  mutate(label = if(n_distinct(label) == 2) names(which.max(table(label))) else label)

Which now gives the number of distinct labels per ID

tbl2 %>% group_by(ID) %>% summarise(n_distinct(label))
  ID    `n_distinct(label)`
  <chr>               <int>
1 A                       1
2 B                       1
3 C                       1
4 D                       3
5 E                       1
  • Related