Home > front end >  How to Count String in a Column
How to Count String in a Column

Time:06-18

I am a novice in R and have a data with two fields. I need to count the number of times the first field element appears in the second field. The second field can contain more one element due to which the below code isn't giving the right answer. Please tell how to modify this or what function can I use here. The count for A1 should be 3 but it is coming as 1 since the presence of A1 in A1;A2 and A3;A1 are not recognized in this code. Thanks.

df0 <- data.frame (ID  = c("A1", "A2", "A3", "A4", "B1", "C1", "D1"),
                  Refer = c(" ", " ", "A1", "A1;A2", "A3;A1", "A2","A2;C1")
)

n1 <- nrow(df0)

df1 = data.frame(matrix(
  vector(), 0, 2, dimnames=list(c(), c("ID","Count"))),
  stringsAsFactors=F)

for (i in 1:n1){
  
  id <- df0$ID[i]
  df2 <- filter(df0, Refer == id) # This assumes only a single ID can be there in Refer
  n2 <- nrow(df2) 
  df1[i,1] <- id
  df1[i,2] <- n2

}

CodePudding user response:

You are almost there. Although, you should use grepl() instead of exact filtering Refer == id.

library(dplyr)
df0 <- data.frame (ID  = c("A1", "A2", "A3", "A4", "B1", "C1", "D1"),
                   Refer = c(" ", " ", "A1", "A1;A2", "A3;A1", "A2","A2;C1")
)


result <- lapply(df0$ID, function(x){
  n = df0 %>% filter(grepl(x, Refer)) %>% nrow
  data.frame(ID = x, count = n)
}) %>% 
  bind_rows

CodePudding user response:

You might strsplit "Refer" at ; and unlist it. Next create a factor out of it using "Id" as levels and simply table the result.

table(factor(unlist(strsplit(df0$Refer, ';')), levels=df0$ID))
# A1 A2 A3 A4 B1 C1 D1 
#  3  3  1  0  0  1  0 

CodePudding user response:

Here is a tidyverse solution:

df0 %>% 
  separate_rows(Refer) %>% 
  mutate(x = str_detect(Refer, pattern)) %>%
  filter(x == TRUE) %>% 
  count(Refer)
  Refer     n
  <chr> <int>
1 A1        3
2 A2        3
3 A3        1
4 C1        1
  • Related