I have a problem. i want to delete some rows from the large data set. The problem is I have data of every 30 sec but I only want to go up to per minute. So, I want to remove the rows which have 30 sec init. for better understanding, I am attaching an example with the expected outcome that I want.
time value
2021-11-04 05:57:00 0.0
2021-11-04 05:57:30 0.0
2021-11-04 05:58:00 0.0
2021-11-04 05:58:30 0.0
2021-11-04 05:59:00 0.0
2021-11-04 05:59:30 0.0
2021-11-04 06:00:00 0.0
2021-11-04 06:00:30 0.0
2021-11-04 06:01:00 0.0
2021-11-04 06:01:30 0.0
2021-11-04 06:02:00 0.0
2021-11-04 06:02:30 0.0
2021-11-04 06:03:00 0.0
2021-11-04 06:03:30 0.0
2021-11-04 06:04:00 0.0
2021-11-04 06:04:30 0.0
2021-11-04 06:05:00 0.0
2021-11-04 06:05:30 0.0
2021-11-04 06:06:00 0.0
2021-11-04 06:06:30 0.0
2021-11-04 06:07:00 0.0
2021-11-04 06:07:30 0.0
what I want to be like that
time value
2021-11-04 05:57:00 0.0
2021-11-04 05:58:00 0.0
2021-11-04 05:59:00 0.0
2021-11-04 06:00:00 0.0
2021-11-04 06:01:00 0.0
2021-11-04 06:02:00 0.0
2021-11-04 06:03:00 0.0
2021-11-04 06:04:00 0.0
2021-11-04 06:05:00 0.0
2021-11-04 06:06:00 0.0
2021-11-04 06:07:00 0.0
every row with 30 sec init should be deleted from the data set.
CodePudding user response:
You can first truncate the time and then remove duplicates. Since the 30 second elements are the non-unique elements, they get removed:
library(xts)
xts3 <- xts(x=rnorm(10), order.by=as.POSIXct(strptime("2021-11-04 05:57:00", "%Y-%m-%d %H:%M:%S") 1:10*30), born=as.POSIXct("1899-05-08"))
# Round observations in z to the next hour
index(xts3) <- as.POSIXct(trunc(index(xts3), units="mins"))
# Remove duplicate times in z
xts3_dup <- make.index.unique(xts3, drop = TRUE)
xts
2021-11-04 05:57:00 -0.19766541
2021-11-04 05:58:00 -0.00902353
2021-11-04 05:58:00 -2.56173420
2021-11-04 05:59:00 0.64355622
2021-11-04 05:59:00 -0.18794658
2021-11-04 06:00:00 0.03005718
2021-11-04 06:00:00 0.64367384
2021-11-04 06:01:00 0.74716446
2021-11-04 06:01:00 -0.29986731
2021-11-04 06:02:00 -0.57503711
> xts3_dup
[,1]
2021-11-04 05:57:00 -0.19766541
2021-11-04 05:58:00 -0.00902353
2021-11-04 05:59:00 0.64355622
2021-11-04 06:00:00 0.03005718
2021-11-04 06:01:00 0.74716446
2021-11-04 06:02:00 -0.57503711
CodePudding user response:
Assuming input x shown in the Note at the end,
1) use to.minutes
like this. Use 1 instead of 4 if you want to keep the :00 value instead of the :30 value.
to.minutes(x, indexAt = "startof")[, 4]
giving:
x.Close
2021-11-04 05:57:00 2
2021-11-04 05:58:00 4
2021-11-04 05:59:00 6
2021-11-04 06:00:00 8
2021-11-04 06:01:00 10
2021-11-04 06:02:00 12
2021-11-04 06:03:00 14
2021-11-04 06:04:00 16
2021-11-04 06:05:00 18
2021-11-04 06:06:00 20
2021-11-04 06:07:00 22
2) Another possibility is to grep the times that end in 30 and then adjust each time 30 seconds downwards.
xx <- x[grep("30$", time(x))]
time(xx) <- time(xx) - 30
If you want the values corresponding to :00 instead then use this:
x[grep("00$", time(x))]
3) Another approach is to use aggregate.zoo and then convert back to xts. Use head instead of tail if you want the :00 values instead or use mean instead of head or tail and omit the 1 if you want to take the average value per minute period.
as.xts(aggregate(x, as.POSIXct(sub("30$", "00", time(x))), tail, 1))
4) We could take every other element and then adjust the time 30 seconds downwards as before.
xx <- x[rep(c(FALSE, TRUE), length = nrow(x))]
time(xx) <- time(xx) - 30
If you want the :00 times instead swap TRUE and FALSE and omit the adjustment of times.
x[rep(c(TRUE, FALSE), length = nrow(x))]
Note
library(xts)
x <- structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22), .Dim = c(22L, 1L), index = structure(c(1636019820,
1636019850, 1636019880, 1636019910, 1636019940, 1636019970, 1636020000,
1636020030, 1636020060, 1636020090, 1636020120, 1636020150, 1636020180,
1636020210, 1636020240, 1636020270, 1636020300, 1636020330, 1636020360,
1636020390, 1636020420, 1636020450), tzone = "", tclass = c("POSIXct",
"POSIXt")), class = c("xts", "zoo"))
> x
[,1]
2021-11-04 05:57:00 1
2021-11-04 05:57:30 2
2021-11-04 05:58:00 3
2021-11-04 05:58:30 4
2021-11-04 05:59:00 5
2021-11-04 05:59:30 6
2021-11-04 06:00:00 7
2021-11-04 06:00:30 8
2021-11-04 06:01:00 9
2021-11-04 06:01:30 10
2021-11-04 06:02:00 11
2021-11-04 06:02:30 12
2021-11-04 06:03:00 13
2021-11-04 06:03:30 14
2021-11-04 06:04:00 15
2021-11-04 06:04:30 16
2021-11-04 06:05:00 17
2021-11-04 06:05:30 18
2021-11-04 06:06:00 19
2021-11-04 06:06:30 20
2021-11-04 06:07:00 21
2021-11-04 06:07:30 22