I'm using tapply to calculate group means but I can't get from the output what those groupings are. The grouping is combination of two variables, Var1 and Var2. It would be easier to demonstrate with some code
set.seed(123)
df <- mtcars
df$VAR1 <- sample(c("A","B"), nrow(mtcars) , replace = TRUE)
df$VAR2 <- sample(c("X","Y"), nrow(mtcars) , replace = TRUE)
df_result <- data.frame(apply(df[,sapply(df,is.numeric)], 2, function(x) tapply(x, list(df$VAR1,df$VAR2),mean)) )
the output
> df_result
mpg cyl disp hp drat wt qsec vs am gear carb
1 22.3800 5.8 194.0500 126.80 3.74900 2.803000 18.44600 0.500 0.5 3.70 2.100
2 17.9900 6.8 284.0700 162.70 3.19900 3.645300 17.82700 0.400 0.2 3.30 2.600
3 18.9125 6.5 252.7875 168.50 3.74375 3.366125 17.16625 0.375 0.5 3.75 3.625
4 21.9750 5.0 144.9000 112.75 3.91500 2.885000 17.77500 0.500 0.5 4.50 3.500
As you can see I can't tell which row is which combination of Var1 and Var2. The 4 groups should be A-X, A-Y, B-X, B-Y. Does anyone know how to add that table? Any modifications or simplification to the code would be ideal. Is it even possible? Yes, I know it can be done more simply with "dplyr" and "aggregate", however I want to use sapply/tapply/lapply in some way to do this. Please simplify if i'm doing something wrong. Any help greatly appreciated. Thanks
CodePudding user response:
Honestly, I don't think this is the right problem to use the apply family of functions. If you want to learn about them there are various other examples that you can use.
Functions like aggregate
are built for this -
aggregate(.~VAR1 VAR2, df, mean)
Obviously, you can cut a paper with a sword but that is not what it is made for. Here's using tapply
-
tmp <- unique(df[c('VAR1', 'VAR2')])
rownames(tmp) <- NULL
cbind(tmp[with(tmp, order(VAR2, VAR1)), ],
sapply(df[,sapply(df,is.numeric)], function(x)
tapply(x, list(df$VAR1,df$VAR2),mean)))
CodePudding user response:
This should you get closer to what you want. (Not using tapply
though.) You could use outer()
to apply custom Vectorize
d functions to combinations of "VAR1"
and "VAR2"
. We also use outer
with paste
to identify the combinations.
v <- lapply(df[, c('VAR1', 'VAR2')], unique) ## get levels
nm <- c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am",
"gear", "carb") ## vector of columns to loop over
For the mean
we may apply colMeans
on columns subset nm
.
V_MEAN <- Vectorize(\(x, y) colMeans(df[df$VAR1 == x & df$VAR2 == y, nm]),
SIMPLIFY=F)
sapply(outer(v[[1]], v[[2]], V_MEAN), I) |>
`colnames<-`(outer(v[[1]], v[[2]], paste, sep='_'))
# A_X B_X A_Y B_Y
# mpg 22.380 17.9900 18.912500 21.975
# cyl 5.800 6.8000 6.500000 5.000
# disp 194.050 284.0700 252.787500 144.900
# hp 126.800 162.7000 168.500000 112.750
# drat 3.749 3.1990 3.743750 3.915
# wt 2.803 3.6453 3.366125 2.885
# qsec 18.446 17.8270 17.166250 17.775
# vs 0.500 0.4000 0.375000 0.500
# am 0.500 0.2000 0.500000 0.500
# gear 3.700 3.3000 3.750000 4.500
# carb 2.100 2.6000 3.625000 3.500
For weighted.mean
, we use an anonymous function in an sapply
, where we define the w=
.
set.seed(42)
df$wgt <- runif(nrow(df)) ## fabricate weights
V_W_MEAN <- Vectorize(\(x, y) {
dat <- df[df$VAR1 == x & df$VAR2 == y, ]
sapply(dat[, nm], \(z) weighted.mean(z, w=dat[, 'wgt']))
}, SIMPLIFY=F)
sapply(outer(v[[1]], v[[2]], V_W_MEAN), I) |>
`colnames<-`(outer(v[[1]], v[[2]], paste, sep='_'))
# A_X B_X A_Y B_Y
# mpg 21.4040177 18.7455432 17.4546812 20.9215362
# cyl 5.8600143 6.6403719 6.7614803 5.4227808
# disp 199.3270864 274.2125329 264.9109708 145.5146065
# hp 125.5585798 164.2029158 183.3016365 135.5210857
# drat 3.7055945 3.2798961 3.7334560 3.8744869
# wt 2.8267939 3.5926588 3.5684028 2.8345649
# qsec 18.4069666 17.8164896 16.9251482 16.9464657
# vs 0.4587740 0.4358106 0.2869283 0.3257468
# am 0.4255709 0.2440034 0.4766754 0.6742532
# gear 3.6095371 3.3727337 3.7589968 4.6742532
# carb 2.1471308 2.6620944 4.1912125 4.3421659
If you prefer lists as result, you can use lapply(outer...) |> setNames(outer(...))
.
Note: "R version 4.1.2 (2021-11-01)"
Data:
df <- structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,
24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 32.4,
30.4, 33.9, 21.5, 15.5, 15.2, 13.3, 19.2, 27.3, 26, 30.4, 15.8,
19.7, 15, 21.4), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8,
8, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8, 6, 8, 4),
disp = c(160, 160, 108, 258, 360, 225, 360, 146.7, 140.8,
167.6, 167.6, 275.8, 275.8, 275.8, 472, 460, 440, 78.7, 75.7,
71.1, 120.1, 318, 304, 350, 400, 79, 120.3, 95.1, 351, 145,
301, 121), hp = c(110, 110, 93, 110, 175, 105, 245, 62, 95,
123, 123, 180, 180, 180, 205, 215, 230, 66, 52, 65, 97, 150,
150, 245, 175, 66, 91, 113, 264, 175, 335, 109), drat = c(3.9,
3.9, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,
3.07, 3.07, 3.07, 2.93, 3, 3.23, 4.08, 4.93, 4.22, 3.7, 2.76,
3.15, 3.73, 3.08, 4.08, 4.43, 3.77, 4.22, 3.62, 3.54, 4.11
), wt = c(2.62, 2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19,
3.15, 3.44, 3.44, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 2.2,
1.615, 1.835, 2.465, 3.52, 3.435, 3.84, 3.845, 1.935, 2.14,
1.513, 3.17, 2.77, 3.57, 2.78), qsec = c(16.46, 17.02, 18.61,
19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3, 18.9, 17.4, 17.6,
18, 17.98, 17.82, 17.42, 19.47, 18.52, 19.9, 20.01, 16.87,
17.3, 15.41, 17.05, 18.9, 16.7, 16.9, 14.5, 15.5, 14.6, 18.6
), vs = c(0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1), am = c(1,
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1), gear = c(4, 4, 4, 3,
3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3,
3, 3, 4, 5, 5, 5, 5, 5, 4), carb = c(4, 4, 1, 1, 2, 1, 4,
2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2, 2, 4, 2, 1,
2, 2, 4, 6, 8, 2), VAR1 = c("A", "A", "A", "B", "A", "B",
"B", "B", "A", "A", "B", "B", "B", "A", "B", "A", "B", "A",
"A", "A", "A", "B", "A", "A", "A", "A", "B", "B", "A", "B",
"A", "B"), VAR2 = c("X", "Y", "Y", "X", "X", "X", "X", "Y",
"X", "Y", "Y", "X", "X", "X", "X", "Y", "X", "X", "Y", "X",
"X", "X", "X", "Y", "Y", "X", "Y", "X", "X", "Y", "Y", "X"
)), row.names = c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710",
"Hornet 4 Drive", "Hornet Sportabout", "Valiant", "Duster 360",
"Merc 240D", "Merc 230", "Merc 280", "Merc 280C", "Merc 450SE",
"Merc 450SL", "Merc 450SLC", "Cadillac Fleetwood", "Lincoln Continental",
"Chrysler Imperial", "Fiat 128", "Honda Civic", "Toyota Corolla",
"Toyota Corona", "Dodge Challenger", "AMC Javelin", "Camaro Z28",
"Pontiac Firebird", "Fiat X1-9", "Porsche 914-2", "Lotus Europa",
"Ford Pantera L", "Ferrari Dino", "Maserati Bora", "Volvo 142E"
), class = "data.frame")