Counting all elements, row wise in list column-CodePudding

My dataframe looks like this:

V1
c("cheese","bread","sugar","cream","milk","butter")
c("milk","butter","apples","cream","bread")
c("butter","milk","toffee")
c("cream","milk","butter","sugar")

I am trying to count the number of times each element appears and sum in a new column. I would like to end up with something like this:

V2         V3
cheese     1
bread      2
sugar      2
cream      3
milk       4
butter     4
apples     1
toffee     1

I have tried using the following code

counts <- unlist(V1, use.names = FALSE)
counts <- table(counts)

But for some reason the counts are wrong and values are being skipped.

CodePudding user response：

If I understand you correctly and your data is organized as provided below, then we could do it this way:

Using separate_rows will allow to bring all your strings in one row.
remove c and empty rows
Use fct_inorder from forcats package (it is in tidyverse) to keep the order as provided
then apply count with the name argument:

library(tidyverse)

df %>% 
  separate_rows(V1) %>% 
  filter(!(V1 == "c" | V1 == "")) %>% 
  mutate(V1 = fct_inorder(V1)) %>% 
  count(V1, name ="V3")

 V1        V3
  <fct>  <int>
1 cheese     1
2 bread      2
3 sugar      2
4 cream      3
5 milk       4
6 butter     4
7 apples     1
8 toffee     1

df <- structure(list(V1 = c("c(\"cheese\",\"bread\",\"sugar\",\"cream\",\"milk\",\"butter\")", 
"c(\"milk\",\"butter\",\"apples\",\"cream\",\"bread\")", "c(\"butter\",\"milk\",\"toffee\")", 
"c(\"cream\",\"milk\",\"butter\",\"sugar\")")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -4L))

CodePudding user response：

A couple of little issues with the question. Found it hard to reproduce exactly so took some liberties with the DF and present a couple of options that might help:

Option 1 - data in one column

library(tidyverse)

df <- data.frame(V1 = c("cheese","bread","sugar","cream","milk","butter",
"milk","butter","apples","cream","bread",
"butter","milk","toffee",
"cream","milk","butter","sugar"))


df <- df %>% dplyr::group_by(V1) %>% 
  summarise(
    V3 = n()
  )

Option 2 - data in columns - added NAs so it made a DF

library(tidyverse)

df <- data.frame(c("cheese","bread","sugar","cream","milk","butter"),
                 c("milk","butter","apples","cream","bread",NA),
                 c("butter","milk","toffee",NA,NA,NA),
                 c("cream","milk","butter","sugar",NA,NA))



df <- data.frame(V1=unlist(df)) %>% 
  select(V1) %>% 
  drop_na() %>% 
  group_by(V1) %>% 
  summarise(V3 = n())

hope this helps!