Home > Software engineering >  R how to create a new column that summarizes the total of a string detect
R how to create a new column that summarizes the total of a string detect

Time:01-05

I have a data frame in R that looks like this:

structure(list(items = c("Apple", "Apple, Pear", "Apple, Pear, Banana"
)), row.names = c(NA, -3L), class = "data.frame")

enter image description here

I would like to create new columns for each item in the "items" column and count the frequency of each item. For example, I want to create an "Apple" column that contains the frequency of "Apple" in the "items" column, a "Pear" column that contains the frequency of "Pear" in the "items" column, and so on.

The final data frame should look like this:

structure(list(items = c("Apple", "Apple, Pear", "Apple, Pear, Banana"
), Apple = c(3, 3, 3), Pear = c(2, 2, 2), Banana = c(1, 1, 1)), row.names = c(NA, 
-3L), class = "data.frame")

enter image description here

I have tried using the mutate() and str_count() functions from the dplyr and stringr packages, but I'm not sure how to get the final data frame that I want.

Here is the code that I have tried so far:

items %>%
  mutate(Apple = str_count(items, "Apple"),
         Pear = str_count(items, "Pear"),
         Banana = str_count(items, "Banana"))

enter image description here

This gets me part way there, but I'm not sure how to create a new column for each item and count the frequency of each item. Can someone help me figure out how to do this in R?

CodePudding user response:

You can wrap str_count with sum:

items %>%
  mutate(Apple = sum(str_count(items, "Apple")),
         Pear = sum(str_count(items, "Pear")),
         Banana = sum(str_count(items, "Banana")))

                items Apple Pear Banana
1               Apple     3    2      1
2         Apple, Pear     3    2      1
3 Apple, Pear, Banana     3    2      1

CodePudding user response:

Especially in situation where you have multiple rows and values -> Here is a solution using separate the rows count and combining with cbind and finally pivoting with filling the NAs:

library(dplyr)
library(tidyr)
df %>% 
  separate_rows(items, sep='\\,') %>% 
  count(items1 = trimws(items)) %>% 
  cbind(df) %>% 
  pivot_wider(names_from = items1, values_from = n) %>% 
  fill(-items, .direction = "downup")
 items               Apple Banana  Pear
  <chr>               <int>  <int> <int>
1 Apple                   3      1     2
2 Apple, Pear             3      1     2
3 Apple, Pear, Banana     3      1     2

CodePudding user response:

Using map - loop over the words of interest, and transmute to return a single column with the count of the word in the items column and bind the output to the original data

library(purrr)
library(dplyr)
 map_dfc(c("Apple", "Pear", "Banana"), ~ df1 %>%
    transmute(!! .x := sum(str_count(items, .x)))) %>%
    bind_cols(df1, .)

-output

                 items Apple Pear Banana
1               Apple     3    2      1
2         Apple, Pear     3    2      1
3 Apple, Pear, Banana     3    2      1

Or another option is to split the column 'items', use mtabulate and cbind the columns after getting the colSums

library(qdapTools)
cbind(df1, as.list(colSums(mtabulate(strsplit(df1$items, ",\\s*")))))
                items Apple Banana Pear
1               Apple     3      1    2
2         Apple, Pear     3      1    2
3 Apple, Pear, Banana     3      1    2

CodePudding user response:

You can try the following,

library(tidyverse)

df <- structure(list(items = c(
    "Apple", "Apple, Pear", "Apple, Pear, Banana"
  )),
  row.names = c(NA,-3L),
  class = "data.frame")


total_count <- function(x, word) {
  paste0(x, collapse = ", ") %>% 
    stringr::str_count(word)
}
  
df %>%
  mutate(Apple = total_count(items, "Apple"),
         Pear = total_count(items, "Pear"),
         Banana = total_count(items, "Banana"))

#>                 items Apple Pear Banana
#> 1               Apple     3    2      1
#> 2         Apple, Pear     3    2      1
#> 3 Apple, Pear, Banana     3    2      1

Created on 2023-01-04 with reprex v2.0.2

  •  Tags:  
  • r
  • Related