I know for a df, I can easily do:
df[-1,]
but this does not seem to work for S4 objects (I am working with granges objects in specific but that shouldnt matter). Is there some sort of -1 equivalent?
Is the solution just:
S4[[2]][length(S4)]
Example:
gr <- GRanges(
seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges = IRanges(101:110, end = 111:120, names = head(letters, 10)),
strand = Rle(strand(c("-", " ", "*", " ", "-")), c(1, 2, 2, 3, 2)),
score = 1:10,
GC = seq(1, 0, length=10))
where I want to select "slots" (?) b-j.
If it was a df I would do:
gr[2:-1,]
CodePudding user response:
To learn how to operate on GRanges
objects, you should consult the methods described in ?GRanges
. The output that you see when you print gr
is generated by the show
method:
show(gr)
## GRanges object with 10 ranges and 2 metadata columns:
## seqnames ranges strand | score GC
## <Rle> <IRanges> <Rle> | <integer> <numeric>
## a chr1 101-111 - | 1 1.000000
## b chr2 102-112 | 2 0.888889
## c chr2 103-113 | 3 0.777778
## d chr2 104-114 * | 4 0.666667
## e chr1 105-115 * | 5 0.555556
## f chr1 106-116 | 6 0.444444
## g chr3 107-117 | 7 0.333333
## h chr3 108-118 | 8 0.222222
## i chr3 109-119 - | 9 0.111111
## j chr3 110-120 - | 10 0.000000
The output gives the impression that gr
is a data frame, but it isn't: what you see has been extracted from the slot values (attributes) of gr
and displayed rectangularly for your convenience.
names(gr)
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
slotNames(gr)
## [1] "seqnames" "ranges" "strand" "seqinfo"
## [5] "elementMetadata" "elementType" "metadata"
gr@seqnames
## factor-Rle of length 10 with 4 runs
## Lengths: 1 3 2 4
## Values : chr1 chr2 chr1 chr3
## Levels(3): chr1 chr2 chr3
There are a few methods for subsetting gr
. Don't expect them to behave exactly like the corresponding methods for data frames. To obtain a second GRanges
object describing all but the first sequence (in this case a
), you can do gr[-1L]
or gr[-1L, ]
:
gr[-1L]
## GRanges object with 9 ranges and 2 metadata columns:
## seqnames ranges strand | score GC
## <Rle> <IRanges> <Rle> | <integer> <numeric>
## b chr2 102-112 | 2 0.888889
## c chr2 103-113 | 3 0.777778
## d chr2 104-114 * | 4 0.666667
## e chr1 105-115 * | 5 0.555556
## f chr1 106-116 | 6 0.444444
## g chr3 107-117 | 7 0.333333
## h chr3 108-118 | 8 0.222222
## i chr3 109-119 - | 9 0.111111
## j chr3 110-120 - | 10 0.000000
## -------
## seqinfo: 3 sequences from an unspecified genome; no seqlengths
identical(gr[-1L], gr[-1L, ])
## [1] TRUE
If the sequences have been assigned unique names, then you can also subset by name, e.g., with gr[names(gr)[-1L]]
, or gr[names(gr)[-1L], ]
.
Run-length encodings of the columns to the left of the vertical bar are stored in the so-named slots and extracted using the so-named methods:
identical(gr@seqnames, seqnames(gr))
## [1] TRUE
The columns to the right of the vertical bar are referred to as "metadata". They are stored together in slot elementMetadata
, which you should extract using method mcols
:
mcols(gr)
## DataFrame with 10 rows and 2 columns
## score GC
## <integer> <numeric>
## a 1 1.000000
## b 2 0.888889
## c 3 0.777778
## d 4 0.666667
## e 5 0.555556
## f 6 0.444444
## g 7 0.333333
## h 8 0.222222
## i 9 0.111111
## j 10 0.000000
The metadata are stored in a DataFrame
object. You will find that, with regard to subsetting, DataFrame
is more faithful than GRanges
to data.frame
semantics. ?DataFrame
explains differences.
mcols(gr)$score
## [1] 1 2 3 4 5 6 7 8 9 10