I have:
- a number m of categorical features (x1, x2, ... xm)
- 1 categorical feature (y)
- all in a dataframe (df).
I would like have a function that give a single table with all the crossings between xi and y: for example
- table1 = table (df $ x1, df $ y) ... tablem = table (df $ xm, df $ y)
- aggregate tables with rbind
I'm almost there but it doesn't work.
CodePudding user response:
How about this:
data(diamonds, package="ggplot2")
tabs <- lapply(diamonds[,c("color", "clarity")], \(x){
table(x, diamonds$cut)
})
do.call(rbind,tabs)
#> Fair Good Very Good Premium Ideal
#> D 163 662 1513 1603 2834
#> E 224 933 2400 2337 3903
#> F 312 909 2164 2331 3826
#> G 314 871 2299 2924 4884
#> H 303 702 1824 2360 3115
#> I 175 522 1204 1428 2093
#> J 119 307 678 808 896
#> I1 210 96 84 205 146
#> SI2 466 1081 2100 2949 2598
#> SI1 408 1560 3240 3575 4282
#> VS2 261 978 2591 3357 5071
#> VS1 170 648 1775 1989 3589
#> VVS2 69 286 1235 870 2606
#> VVS1 17 186 789 616 2047
#> IF 9 71 268 230 1212
Created on 2022-05-30 by the reprex package (v2.0.1)
CodePudding user response:
An example with mtcars, c("vs","am","gear") (your x's) vs "carb" (your y):
do.call(
rbind,
sapply(
c("vs","am","gear"),
function(x){
as.data.frame(table(mtcars[,x],mtcars$carb))
},
simplify=F
)
)
Var1 Var2 Freq
vs.1 0 1 0
vs.2 1 1 7
vs.3 0 2 5
vs.4 1 2 5
vs.5 0 3 3
vs.6 1 3 0
vs.7 0 4 8
vs.8 1 4 2
...
var1 is the value of to variable in the row names, var2 is the value of y.