My dataframe looks like this:
V1
c("cheese","bread","sugar","cream","milk","butter")
c("milk","butter","apples","cream","bread")
c("butter","milk","toffee")
c("cream","milk","butter","sugar")
I am trying to count the number of times each element appears and sum in a new column. I would like to end up with something like this:
V2 V3
cheese 1
bread 2
sugar 2
cream 3
milk 4
butter 4
apples 1
toffee 1
I have tried using the following code
counts <- unlist(V1, use.names = FALSE)
counts <- table(counts)
But for some reason the counts are wrong and values are being skipped.
CodePudding user response:
If I understand you correctly and your data is organized as provided below, then we could do it this way:
Using separate_rows will allow to bring all your strings in one row.
remove c and empty rows
Use
fct_inorder
fromforcats
package (it is in tidyverse) to keep the order as providedthen apply count with the name argument:
library(tidyverse)
df %>%
separate_rows(V1) %>%
filter(!(V1 == "c" | V1 == "")) %>%
mutate(V1 = fct_inorder(V1)) %>%
count(V1, name ="V3")
V1 V3
<fct> <int>
1 cheese 1
2 bread 2
3 sugar 2
4 cream 3
5 milk 4
6 butter 4
7 apples 1
8 toffee 1
df <- structure(list(V1 = c("c(\"cheese\",\"bread\",\"sugar\",\"cream\",\"milk\",\"butter\")",
"c(\"milk\",\"butter\",\"apples\",\"cream\",\"bread\")", "c(\"butter\",\"milk\",\"toffee\")",
"c(\"cream\",\"milk\",\"butter\",\"sugar\")")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))
CodePudding user response:
A couple of little issues with the question. Found it hard to reproduce exactly so took some liberties with the DF and present a couple of options that might help:
Option 1 - data in one column
library(tidyverse)
df <- data.frame(V1 = c("cheese","bread","sugar","cream","milk","butter",
"milk","butter","apples","cream","bread",
"butter","milk","toffee",
"cream","milk","butter","sugar"))
df <- df %>% dplyr::group_by(V1) %>%
summarise(
V3 = n()
)
Option 2 - data in columns - added NAs so it made a DF
library(tidyverse)
df <- data.frame(c("cheese","bread","sugar","cream","milk","butter"),
c("milk","butter","apples","cream","bread",NA),
c("butter","milk","toffee",NA,NA,NA),
c("cream","milk","butter","sugar",NA,NA))
df <- data.frame(V1=unlist(df)) %>%
select(V1) %>%
drop_na() %>%
group_by(V1) %>%
summarise(V3 = n())
hope this helps!