I need to create a data frame containing the frequency of each categorical variable from a previous data frame. Fortunately, these variables are all structured with numbers, from 1 to 5, instead of texts.
Therefore, I could create a new data frame with a first column containing the numbers 1 to 5, and each following column counting the frequency of that number as the response for each variable in the original data frame.
For example, we have an original df defined as:
df1 <- data.frame(
Z = c(4, 1, 2, 1, 5, 4, 2, 5, 1, 5),
Y = c(5, 1, 5, 5, 2, 1, 4, 1, 3, 3),
X = c(4, 2, 2, 1, 5, 1, 5, 1, 3, 2),
W = c(2, 1, 4, 2, 3, 2, 4, 2, 1, 2),
V = c(5, 1, 3, 3, 3, 3, 2, 4, 4, 1))
I would need a second df containing the following table:
fq Z Y X W V
1 3 3 3 2 2
2 4 2 6 10 2
3 0 6 3 3 12
4 8 4 4 8 8
5 15 15 10 0 5
I saw some answers of how to do smething like this using plyr, but not in a systematic way. Can someone help me out?
CodePudding user response:
We may use
sapply(df1, function(x) tapply(x, factor(x, levels = 1:5), FUN = sum))
Z Y X W V
1 3 3 3 2 2
2 4 2 6 10 2
3 NA 6 3 3 12
4 8 4 4 8 8
5 15 15 10 NA 5
CodePudding user response:
table(stack(df1)) * 1:5
ind
values Z Y X W V
1 3 3 3 2 2
2 4 2 6 10 2
3 0 6 3 3 12
4 8 4 4 8 8
5 15 15 10 0 5
If you need a data.frame, you could do:
as.data.frame.matrix(table(stack(df1)) * 1:5)
CodePudding user response:
Another possible solution, based on purrr::map_dfc
:
library(tidyverse)
map_dfc(df1, ~ 1:5 * table(factor(.x, levels = 1:5)) %>% as.vector)
#> # A tibble: 5 × 5
#> Z Y X W V
#> <int> <int> <int> <int> <int>
#> 1 3 3 3 2 2
#> 2 4 2 6 10 2
#> 3 0 6 3 3 12
#> 4 8 4 4 8 8
#> 5 15 15 10 0 5