Home > Net >  Get the final entry in a S4 object R
Get the final entry in a S4 object R

Time:02-16

I know for a df, I can easily do:

df[-1,]

but this does not seem to work for S4 objects (I am working with granges objects in specific but that shouldnt matter). Is there some sort of -1 equivalent?

Is the solution just:

S4[[2]][length(S4)]

Example:

gr <- GRanges(
seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges = IRanges(101:110, end = 111:120, names = head(letters, 10)),
strand = Rle(strand(c("-", " ", "*", " ", "-")), c(1, 2, 2, 3, 2)),
score = 1:10,
GC = seq(1, 0, length=10))

where I want to select "slots" (?) b-j.

If it was a df I would do:

gr[2:-1,]

CodePudding user response:

To learn how to operate on GRanges objects, you should consult the methods described in ?GRanges. The output that you see when you print gr is generated by the show method:

show(gr)
## GRanges object with 10 ranges and 2 metadata columns:
##     seqnames    ranges strand |     score        GC
##        <Rle> <IRanges>  <Rle> | <integer> <numeric>
##   a     chr1   101-111      - |         1  1.000000
##   b     chr2   102-112        |         2  0.888889
##   c     chr2   103-113        |         3  0.777778
##   d     chr2   104-114      * |         4  0.666667
##   e     chr1   105-115      * |         5  0.555556
##   f     chr1   106-116        |         6  0.444444
##   g     chr3   107-117        |         7  0.333333
##   h     chr3   108-118        |         8  0.222222
##   i     chr3   109-119      - |         9  0.111111
##   j     chr3   110-120      - |        10  0.000000

The output gives the impression that gr is a data frame, but it isn't: what you see has been extracted from the slot values (attributes) of gr and displayed rectangularly for your convenience.

names(gr)
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

slotNames(gr)
## [1] "seqnames"        "ranges"          "strand"          "seqinfo"        
## [5] "elementMetadata" "elementType"     "metadata"       

gr@seqnames
## factor-Rle of length 10 with 4 runs
##   Lengths:    1    3    2    4
##   Values : chr1 chr2 chr1 chr3
## Levels(3): chr1 chr2 chr3

There are a few methods for subsetting gr. Don't expect them to behave exactly like the corresponding methods for data frames. To obtain a second GRanges object describing all but the first sequence (in this case a), you can do gr[-1L] or gr[-1L, ]:

gr[-1L]
## GRanges object with 9 ranges and 2 metadata columns:
##     seqnames    ranges strand |     score        GC
##        <Rle> <IRanges>  <Rle> | <integer> <numeric>
##   b     chr2   102-112        |         2  0.888889
##   c     chr2   103-113        |         3  0.777778
##   d     chr2   104-114      * |         4  0.666667
##   e     chr1   105-115      * |         5  0.555556
##   f     chr1   106-116        |         6  0.444444
##   g     chr3   107-117        |         7  0.333333
##   h     chr3   108-118        |         8  0.222222
##   i     chr3   109-119      - |         9  0.111111
##   j     chr3   110-120      - |        10  0.000000
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths

identical(gr[-1L], gr[-1L, ])
## [1] TRUE

If the sequences have been assigned unique names, then you can also subset by name, e.g., with gr[names(gr)[-1L]], or gr[names(gr)[-1L], ].

Run-length encodings of the columns to the left of the vertical bar are stored in the so-named slots and extracted using the so-named methods:

identical(gr@seqnames, seqnames(gr))
## [1] TRUE

The columns to the right of the vertical bar are referred to as "metadata". They are stored together in slot elementMetadata, which you should extract using method mcols:

mcols(gr)
## DataFrame with 10 rows and 2 columns
##       score        GC
##   <integer> <numeric>
## a         1  1.000000
## b         2  0.888889
## c         3  0.777778
## d         4  0.666667
## e         5  0.555556
## f         6  0.444444
## g         7  0.333333
## h         8  0.222222
## i         9  0.111111
## j        10  0.000000

The metadata are stored in a DataFrame object. You will find that, with regard to subsetting, DataFrame is more faithful than GRanges to data.frame semantics. ?DataFrame explains differences.

mcols(gr)$score
## [1]  1  2  3  4  5  6  7  8  9 10
  • Related