Home > other >  R Counting Occurences in a Column of Data Frame, Grouped by Another Column
R Counting Occurences in a Column of Data Frame, Grouped by Another Column

Time:10-27

I basically have a data frame with a column of letters and a column of colors:

x <- data.frame(col1=c("a","b","a","c","d","d","c","a","b","c"),
                col2=c("red","orange","yellow","red","red","yellow","orange","yellow","red","orange"))

col1   col2
a      red
b      orange 
a      yellow
c      red
d      red
d      yellow
c      orange
a      yellow
b      red
c      orange

My goal is to create a second data frame that counts the number of occurences of each color in col2 of x for each letter in col1. Basically:

Letters   Occurences   Red   Orange  Yellow
a         3            1     0       2
b         2            1     1       0
c         3            1     2       0
d         2            1     0       1

Right now, I just brute forced it since there are only 3 factors of col2. I used:

df <- data.frame(Letters = levels(factor(x$col1)))
df$Occurences <- table(x$col1)
df$red <- table(factor(x$col1[x$col2=="red"],levels=levels(factor(x$col1))))
df$orange <- table(factor(x$col1[x$col2=="orange"],levels=levels(factor(x$col1))))
df$yellow <- table(factor(x$col1[x$col2=="yellow"],levels=levels(factor(x$col1))))

Is there an easier way to do this, as opposed to doing each column of df one by one? Especially with a data set that has a lot more than 3 factors?

CodePudding user response:

Use pivot_wider from tidyr

library(tidyr)
x %>% 
  pivot_wider(names_from = col2, values_from = col2, values_fn = "length", values_fill = 0)

Output:

# A tibble: 4 × 4
  col1    red orange yellow
  <chr> <int>  <int>  <int>
1 a         1      0      2
2 b         1      1      0
3 c         1      2      0
4 d         1      0      1

CodePudding user response:

 as.data.frame.matrix(addmargins(table(x), 2))

  orange red yellow Sum
a      0   1      2   3
b      1   1      0   2
c      2   1      0   3
d      0   1      1   2
  • Related