Get the final entry in a S4 object R-CodePudding

I know for a df, I can easily do:

df[-1,]

but this does not seem to work for S4 objects (I am working with granges objects in specific but that shouldnt matter). Is there some sort of -1 equivalent?

Is the solution just:

S4[[2]][length(S4)]

Example:

gr <- GRanges(
seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges = IRanges(101:110, end = 111:120, names = head(letters, 10)),
strand = Rle(strand(c("-", " ", "*", " ", "-")), c(1, 2, 2, 3, 2)),
score = 1:10,
GC = seq(1, 0, length=10))

where I want to select "slots" (?) b-j.

If it was a df I would do:

gr[2:-1,]

CodePudding user response：

To learn how to operate on GRanges objects, you should consult the methods described in ?GRanges. The output that you see when you print gr is generated by the show method:

show(gr)
## GRanges object with 10 ranges and 2 metadata columns:
##     seqnames    ranges strand |     score        GC
##        <Rle> <IRanges>  <Rle> | <integer> <numeric>
##   a     chr1   101-111      - |         1  1.000000
##   b     chr2   102-112        |         2  0.888889
##   c     chr2   103-113        |         3  0.777778
##   d     chr2   104-114      * |         4  0.666667
##   e     chr1   105-115      * |         5  0.555556
##   f     chr1   106-116        |         6  0.444444
##   g     chr3   107-117        |         7  0.333333
##   h     chr3   108-118        |         8  0.222222
##   i     chr3   109-119      - |         9  0.111111
##   j     chr3   110-120      - |        10  0.000000

The output gives the impression that gr is a data frame, but it isn't: what you see has been extracted from the slot values (attributes) of gr and displayed rectangularly for your convenience.

names(gr)
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

slotNames(gr)
## [1] "seqnames"        "ranges"          "strand"          "seqinfo"        
## [5] "elementMetadata" "elementType"     "metadata"       

gr@seqnames
## factor-Rle of length 10 with 4 runs
##   Lengths:    1    3    2    4
##   Values : chr1 chr2 chr1 chr3
## Levels(3): chr1 chr2 chr3

There are a few methods for subsetting gr. Don't expect them to behave exactly like the corresponding methods for data frames. To obtain a second GRanges object describing all but the first sequence (in this case a), you can do gr[-1L] or gr[-1L, ]:

gr[-1L]
## GRanges object with 9 ranges and 2 metadata columns:
##     seqnames    ranges strand |     score        GC
##        <Rle> <IRanges>  <Rle> | <integer> <numeric>
##   b     chr2   102-112        |         2  0.888889
##   c     chr2   103-113        |         3  0.777778
##   d     chr2   104-114      * |         4  0.666667
##   e     chr1   105-115      * |         5  0.555556
##   f     chr1   106-116        |         6  0.444444
##   g     chr3   107-117        |         7  0.333333
##   h     chr3   108-118        |         8  0.222222
##   i     chr3   109-119      - |         9  0.111111
##   j     chr3   110-120      - |        10  0.000000
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths

identical(gr[-1L], gr[-1L, ])
## [1] TRUE

If the sequences have been assigned unique names, then you can also subset by name, e.g., with gr[names(gr)[-1L]], or gr[names(gr)[-1L], ].

Run-length encodings of the columns to the left of the vertical bar are stored in the so-named slots and extracted using the so-named methods:

identical(gr@seqnames, seqnames(gr))
## [1] TRUE

The columns to the right of the vertical bar are referred to as "metadata". They are stored together in slot elementMetadata, which you should extract using method mcols:

mcols(gr)
## DataFrame with 10 rows and 2 columns
##       score        GC
##   <integer> <numeric>
## a         1  1.000000
## b         2  0.888889
## c         3  0.777778
## d         4  0.666667
## e         5  0.555556
## f         6  0.444444
## g         7  0.333333
## h         8  0.222222
## i         9  0.111111
## j        10  0.000000

The metadata are stored in a DataFrame object. You will find that, with regard to subsetting, DataFrame is more faithful than GRanges to data.frame semantics. ?DataFrame explains differences.

mcols(gr)$score
## [1]  1  2  3  4  5  6  7  8  9 10