I wish to order the columns of a dataset in order of decreasing column variance but I have had no luck in doing so. This is what I have so far:
og_data <- og_data[, sort(apply(og_data, 2, var), decreasing=TRUE)]
Now, I know this doesn't work since sort(apply(og_data, 2, var), decreasing=TRUE)
returns the variance values of the columns in order of decreasing variance. I have no idea how to extract the column indexes from this which is what I would need to use. Any help would be much appreciated.
CodePudding user response:
Since you did not give reproducible data for me to work with you can try this method below
# sorting examples using the mtcars dataset
attach(mtcars)
# sort by mpg
new_data <- mtcars[order(mpg),]
# sort by mpg and cyl
new_data <- mtcars[order(mpg, cyl),]
#sort by mpg (ascending) and cyl (descending)
new_data <- mtcars[order(mpg, -cyl),]
hope this solved your question
CodePudding user response:
Since the goal is to order the columns of the data frame by descending variance, we calculate the variances and use order()
to sort by descending variance.
We'll use mtcars
to illustrate, given the absence of a minimal reproducible example:
mtcars[,order(apply(mtcars,2,var),decreasing=TRUE)]
...and the output:
disp hp mpg qsec cyl carb wt gear drat vs am
Mazda RX4 160.0 110 21.0 16.46 6 4 2.620 4 3.90 0 1
Mazda RX4 Wag 160.0 110 21.0 17.02 6 4 2.875 4 3.90 0 1
Datsun 710 108.0 93 22.8 18.61 4 1 2.320 4 3.85 1 1
Hornet 4 Drive 258.0 110 21.4 19.44 6 1 3.215 3 3.08 1 0
Hornet Sportabout 360.0 175 18.7 17.02 8 2 3.440 3 3.15 0 0
Valiant 225.0 105 18.1 20.22 6 1 3.460 3 2.76 1 0
Duster 360 360.0 245 14.3 15.84 8 4 3.570 3 3.21 0 0
Merc 240D 146.7 62 24.4 20.00 4 2 3.190 4 3.69 1 0
Merc 230 140.8 95 22.8 22.90 4 2 3.150 4 3.92 1 0
Merc 280 167.6 123 19.2 18.30 6 4 3.440 4 3.92 1 0
Merc 280C 167.6 123 17.8 18.90 6 4 3.440 4 3.92 1 0
Merc 450SE 275.8 180 16.4 17.40 8 3 4.070 3 3.07 0 0
Merc 450SL 275.8 180 17.3 17.60 8 3 3.730 3 3.07 0 0
Merc 450SLC 275.8 180 15.2 18.00 8 3 3.780 3 3.07 0 0
Cadillac Fleetwood 472.0 205 10.4 17.98 8 4 5.250 3 2.93 0 0
Lincoln Continental 460.0 215 10.4 17.82 8 4 5.424 3 3.00 0 0
Chrysler Imperial 440.0 230 14.7 17.42 8 4 5.345 3 3.23 0 0
Fiat 128 78.7 66 32.4 19.47 4 1 2.200 4 4.08 1 1
Honda Civic 75.7 52 30.4 18.52 4 2 1.615 4 4.93 1 1
Toyota Corolla 71.1 65 33.9 19.90 4 1 1.835 4 4.22 1 1
Toyota Corona 120.1 97 21.5 20.01 4 1 2.465 3 3.70 1 0
Dodge Challenger 318.0 150 15.5 16.87 8 2 3.520 3 2.76 0 0
AMC Javelin 304.0 150 15.2 17.30 8 2 3.435 3 3.15 0 0
Camaro Z28 350.0 245 13.3 15.41 8 4 3.840 3 3.73 0 0
Pontiac Firebird 400.0 175 19.2 17.05 8 2 3.845 3 3.08 0 0
Fiat X1-9 79.0 66 27.3 18.90 4 1 1.935 4 4.08 1 1
Porsche 914-2 120.3 91 26.0 16.70 4 2 2.140 5 4.43 0 1
Lotus Europa 95.1 113 30.4 16.90 4 2 1.513 5 3.77 1 1
Ford Pantera L 351.0 264 15.8 14.50 8 4 3.170 5 4.22 0 1
Ferrari Dino 145.0 175 19.7 15.50 6 6 2.770 5 3.62 0 1
Maserati Bora 301.0 335 15.0 14.60 8 8 3.570 5 3.54 0 1
Volvo 142E 121.0 109 21.4 18.60 4 2 2.780 4 4.11 1 1
>
To cross-check the results, we'll sort and print the vector of variances:
#cross-check variances
variances <- apply(mtcars,2,var)
variances[order(variances,decreasing = TRUE)]
Notice that the ordering of the vector matches the ordering of the columns from the prior operation.
disp hp mpg qsec cyl carb
1.536080e 04 4.700867e 03 3.632410e 01 3.193166e 00 3.189516e 00 2.608871e 00
wt gear drat vs am
9.573790e-01 5.443548e-01 2.858814e-01 2.540323e-01 2.489919e-01
>