Home > Software engineering >  How to Count a Value inside a List that contains Separated Comma Value in a DataFrame (R)
How to Count a Value inside a List that contains Separated Comma Value in a DataFrame (R)

Time:10-27

I have a question :

I have this DF

#> # A tibble: 4 × 3
#>   family name      items
#>   <chr>  <chr>     <list>
#> 1 Kelly  Mark      book, ring, necklace
#> 2 Kelly  Scott     axe, camera, watch
#> 3 Quin   Tegan     book, camera, watch
#> 4 Quin   Sara      sword, fork, book

How to count each items in the list inside that dataframe into Total like this : count(book) = 3 count(camera) = 2 etc

should I pivot wider the entire item into new column? I'm really sorry if my question is too basic, since I really new into Data Processing

Thank You

#My Approach I tried to using pivot longer, but the columns become too many. The list contains hundreds of value, and it seems bother me to handle such a big data. I haven't yet tried another solution.

CodePudding user response:

From your question you mention that you do not want to pivot wider because of the amount of columns it would create. One alternative is to put the count into a list:

count <- as.list(table(unlist(df$items)))

count$book
[1] 3

Note: this is a count across all rows, which is what your post suggests you are looking for.

CodePudding user response:

Maybe something like this:

  df %>% 
    unnest(items) %>% 
    unnest(items) %>% 
    count(items, name="count")
  items    count
  <chr>    <int>
1 axe          1
2 book         3
3 camera       2
4 fork         1
5 necklace     1
6 ring         1
7 sword        1
8 watch        2

CodePudding user response:

library(dplyr)
library(tidyr)

df <- tibble(
  family = c("Kelly", "Kelly", "Quin", "Quin"), 
  name = c("Mark", "Scott", "Tegan", "Sara"),
  items = c("book, ring, necklace",
            "axe, camera, watch",
            "book, camera, watch",
            "sword, fork, book"))


df %>% separate(items, into = c("i1", 'i2', 'i3')) %>% 
  pivot_longer(cols = i1:i3, names_to = "item_order", values_to = "item") %>% 
  count(item, sort = TRUE)

CodePudding user response:

The other answers thus far use a character "items" column whereas the poster specified a list-column. The list-column may be unnested with tidyverse functions then counted as follows:

library(dplyr)

df <- tibble::tribble(
  ~family, ~name, ~items,
  "Kelly", "Mark", list("book", "ring", "necklace"),
  "Kelly", "Scott", list("axe", "camera", "watch"),
  "Quin", "Tegan", list("book", "camera", "watch"),
  "Quin", "Sara", list("sword", "fork", "book")
)

df %>%
  tidyr::unnest_longer(items) %>%
  count(items)
#> # A tibble: 8 × 2
#>   items        n
#>   <chr>    <int>
#> 1 axe          1
#> 2 book         3
#> 3 camera       2
#> 4 fork         1
#> 5 necklace     1
#> 6 ring         1
#> 7 sword        1
#> 8 watch        2

Created on 2022-10-26 with reprex v2.0.2

  • Related