I basically have a data frame with a column of letters and a column of colors:
x <- data.frame(col1=c("a","b","a","c","d","d","c","a","b","c"),
col2=c("red","orange","yellow","red","red","yellow","orange","yellow","red","orange"))
col1 col2
a red
b orange
a yellow
c red
d red
d yellow
c orange
a yellow
b red
c orange
My goal is to create a second data frame that counts the number of occurences of each color in col2
of x
for each letter in col1
. Basically:
Letters Occurences Red Orange Yellow
a 3 1 0 2
b 2 1 1 0
c 3 1 2 0
d 2 1 0 1
Right now, I just brute forced it since there are only 3 factors of col2
. I used:
df <- data.frame(Letters = levels(factor(x$col1)))
df$Occurences <- table(x$col1)
df$red <- table(factor(x$col1[x$col2=="red"],levels=levels(factor(x$col1))))
df$orange <- table(factor(x$col1[x$col2=="orange"],levels=levels(factor(x$col1))))
df$yellow <- table(factor(x$col1[x$col2=="yellow"],levels=levels(factor(x$col1))))
Is there an easier way to do this, as opposed to doing each column of df
one by one? Especially with a data set that has a lot more than 3 factors?
CodePudding user response:
Use pivot_wider
from tidyr
library(tidyr)
x %>%
pivot_wider(names_from = col2, values_from = col2, values_fn = "length", values_fill = 0)
Output:
# A tibble: 4 × 4
col1 red orange yellow
<chr> <int> <int> <int>
1 a 1 0 2
2 b 1 1 0
3 c 1 2 0
4 d 1 0 1
CodePudding user response:
as.data.frame.matrix(addmargins(table(x), 2))
orange red yellow Sum
a 0 1 2 3
b 1 1 0 2
c 2 1 0 3
d 0 1 1 2