Home > OS >  For each column, rowname of top n values
For each column, rowname of top n values

Time:03-14

as the title says, what I'd like to have is the rownames of the top N values of each column. I have a dataframe containing TV series characters on the rows and each column is one episode. In order to get a list of most relevant characters, I think taking maybe the three most speaking characters in each episode might be a nice way. I have thought about just looping through each column, ordering it, taking the name and adding it to an array, but there must be a more efficient way to do this. Thank you all very much in advance.

CodePudding user response:

Here is one way to do it using the mtcars dataset. It uses order() to sort the row names based on the values of each column, using tail(). sapply() is used to apply to all columns and return a data.frame:

rn <- row.names(mtcars)
f <- function(x) tail(rn[order(x)], n = 3)
sapply(mtcars, f)
##      mpg              cyl                disp                  hp              
## [1,] "Lotus Europa"   "Pontiac Firebird" "Chrysler Imperial"   "Camaro Z28"    
## [2,] "Fiat 128"       "Ford Pantera L"   "Lincoln Continental" "Ford Pantera L"
## [3,] "Toyota Corolla" "Maserati Bora"    "Cadillac Fleetwood"  "Maserati Bora" 
##      drat             wt                    qsec            vs            
## [1,] "Ford Pantera L" "Cadillac Fleetwood"  "Toyota Corona" "Fiat X1-9"   
## [2,] "Porsche 914-2"  "Chrysler Imperial"   "Valiant"       "Lotus Europa"
## [3,] "Honda Civic"    "Lincoln Continental" "Merc 230"      "Volvo 142E"  
##      am              gear             carb            
## [1,] "Ferrari Dino"  "Ford Pantera L" "Ford Pantera L"
## [2,] "Maserati Bora" "Ferrari Dino"   "Ferrari Dino"  
## [3,] "Volvo 142E"    "Maserati Bora"  "Maserati Bora"
  • Related