I am currently looking at something similar to df
what I would like to be able to do is produce soemthing that looks like df2
. Where the specified column values are compared next to eachother, the number of specific occurences are counted, and the count is places into a new column in a new dataframe.
For example: in df
the combination 1, 5, and 9 occur 3 times.
df <- data.frame( col1 = c(1,2,3,4,1,2,3,4,1),
col2 = c(5,6,7,8,5,6,7,8,5),
col3 = c(9,10,11,12,9,10,11,13,9))
df2 <- data.frame( col1 = c(1,2,3,4,4),
col2 = c(5,6,7,8,8),
col3 = c(9,10,11,12,13),
count = c(3,2,2,1,1))
I tried using dplyr
df2 <- df %>%
distinct(col1,col2, col3) %>%
group_by(col3) %>%
summarize("count" = n())
with no success
CodePudding user response:
library(dplyr)
df %>%
count(col1,col2,col3)
col1 col2 col3 n
1 1 5 9 3
2 2 6 10 2
3 3 7 11 2
4 4 8 12 1
5 4 8 13 1
CodePudding user response:
Is using plyr
fine?
library(plyr)
ddply(df,.(col1,col2,col3),nrow)
Output:
col1 col2 col3 V1
1 1 5 9 3
2 2 6 10 2
3 3 7 11 2
4 4 8 12 1
5 4 8 13 1
CodePudding user response:
The best way to do it with dplyr
is using count()
as suggested by Vinícius Félix's response
However, here is a fix using the syntax you started. You were thinking in the right direction.
Library
library(dplyr)
Solution to your code
df %>%
# distinct(col1,col2, col3) # you don't need this row, remove it.
group_by(col1, col2, col3) %>% # you have to group by all columns you want to check
summarize(count = n()) %>% # quotes are not needed, but are not wrong
ungroup() # Always add ungroup() at the end to solve future problems
Output
#> # A tibble: 5 × 4
#> col1 col2 col3 count
#> <dbl> <dbl> <dbl> <int>
#> 1 1 5 9 3
#> 2 2 6 10 2
#> 3 3 7 11 2
#> 4 4 8 12 1
#> 5 4 8 13 1
Created on 2022-12-03 with reprex v2.0.2