Home > front end >  How do I access only specific colums of the summary() output?
How do I access only specific colums of the summary() output?

Time:06-23

Essentially what I want to do is only use the summary() function in r only on specific columns of my df.

Basically doing this (using the cars df as an example):

cars_summary <- summary(cars)
speed_summary <- cars_summary$speed

When I try to do this I get an error saying:

$ operator is invalid for atomic vectors

What does that mean and is there a way to do this without sapply()?

Thanks!

CodePudding user response:

Up front, I think you can do just summary(cars$speed) to get what you want.

summary(cars$speed)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#     4.0    12.0    15.0    15.4    19.0    25.0 

If you will want this for multiple columns, and speed is just one example, then try this:

cars_summary <- lapply(cars, summary)
cars_summary$speed
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#     4.0    12.0    15.0    15.4    19.0    25.0 

However, there are some other things going on.

  1. The column names of the summary matrix are buffered/padded so that the names appear to be centered over the stats. This is purely aesthetic, but they do make it a little unpredictable (well, not-easy) to capture.

    cars_summary <- summary(cars)
    dimnames(cars_summary)
    # [[1]]
    # [1] "" "" "" "" "" ""
    # [[2]]
    # [1] "    speed" "     dist"
    
  2. It's a matrix, so you cannot use $-indexing on it. One would instead need to use [,"speed"] or whatever.

    cars_summary[,"    speed"]
    #                                                                                                       
    # "Min.   : 4.0  " "1st Qu.:12.0  " "Median :15.0  " "Mean   :15.4  " "3rd Qu.:19.0  " "Max.   :25.0  " 
    
    ### or perhaps
    colnames(cars_summary) <- names(cars)
    cars_summary[,"speed"]
    #                                                                                                       
    # "Min.   : 4.0  " "1st Qu.:12.0  " "Median :15.0  " "Mean   :15.4  " "3rd Qu.:19.0  " "Max.   :25.0  " 
    
  3. Hrrmmm, it's a matrix, but it's a matrix of strings, as you can see above and here:

    ### back to the original cars_summary
    str(cars_summary)
    #  'table' chr [1:6, 1:2] "Min.   : 4.0  " "1st Qu.:12.0  " "Median :15.0  " "Mean   :15.4  " "3rd Qu.:19.0  " "Max.   :25.0  " ...
    #  - attr(*, "dimnames")=List of 2
    #   ..$ : chr [1:6] "" "" "" "" ...
    #   ..$ : chr [1:2] "    speed" "     dist"
    

    While one could certainly use some patterns or such to extract those numbers, there will be loss of precision/accuracy.

  • Related