how can we remove the rows from xts based on the seconds criteria-CodePudding

I have a problem. i want to delete some rows from the large data set. The problem is I have data of every 30 sec but I only want to go up to per minute. So, I want to remove the rows which have 30 sec init. for better understanding, I am attaching an example with the expected outcome that I want.

time                        value
2021-11-04 05:57:00         0.0
2021-11-04 05:57:30         0.0
2021-11-04 05:58:00         0.0
2021-11-04 05:58:30         0.0
2021-11-04 05:59:00         0.0
2021-11-04 05:59:30         0.0
2021-11-04 06:00:00         0.0
2021-11-04 06:00:30         0.0
2021-11-04 06:01:00         0.0
2021-11-04 06:01:30         0.0
2021-11-04 06:02:00         0.0
2021-11-04 06:02:30         0.0
2021-11-04 06:03:00         0.0
2021-11-04 06:03:30         0.0
2021-11-04 06:04:00         0.0
2021-11-04 06:04:30         0.0
2021-11-04 06:05:00         0.0
2021-11-04 06:05:30         0.0
2021-11-04 06:06:00         0.0
2021-11-04 06:06:30         0.0
2021-11-04 06:07:00         0.0
2021-11-04 06:07:30         0.0

what I want to be like that

time                        value
2021-11-04 05:57:00         0.0

2021-11-04 05:58:00         0.0

2021-11-04 05:59:00         0.0

2021-11-04 06:00:00         0.0

2021-11-04 06:01:00         0.0

2021-11-04 06:02:00         0.0

2021-11-04 06:03:00         0.0

2021-11-04 06:04:00         0.0

2021-11-04 06:05:00         0.0

2021-11-04 06:06:00         0.0

2021-11-04 06:07:00         0.0

every row with 30 sec init should be deleted from the data set.

CodePudding user response：

You can first truncate the time and then remove duplicates. Since the 30 second elements are the non-unique elements, they get removed:

library(xts)
xts3 <- xts(x=rnorm(10), order.by=as.POSIXct(strptime("2021-11-04 05:57:00", "%Y-%m-%d %H:%M:%S") 1:10*30), born=as.POSIXct("1899-05-08"))

# Round observations in z to the next hour
index(xts3) <- as.POSIXct(trunc(index(xts3), units="mins"))

# Remove duplicate times in z
xts3_dup <- make.index.unique(xts3, drop = TRUE)

xts
2021-11-04 05:57:00 -0.19766541
2021-11-04 05:58:00 -0.00902353
2021-11-04 05:58:00 -2.56173420
2021-11-04 05:59:00  0.64355622
2021-11-04 05:59:00 -0.18794658
2021-11-04 06:00:00  0.03005718
2021-11-04 06:00:00  0.64367384
2021-11-04 06:01:00  0.74716446
2021-11-04 06:01:00 -0.29986731
2021-11-04 06:02:00 -0.57503711

> xts3_dup
                           [,1]
2021-11-04 05:57:00 -0.19766541
2021-11-04 05:58:00 -0.00902353
2021-11-04 05:59:00  0.64355622
2021-11-04 06:00:00  0.03005718
2021-11-04 06:01:00  0.74716446
2021-11-04 06:02:00 -0.57503711

CodePudding user response：

Assuming input x shown in the Note at the end,

1) use to.minutes like this. Use 1 instead of 4 if you want to keep the :00 value instead of the :30 value.

to.minutes(x, indexAt = "startof")[, 4]

giving:

                    x.Close
2021-11-04 05:57:00       2
2021-11-04 05:58:00       4
2021-11-04 05:59:00       6
2021-11-04 06:00:00       8
2021-11-04 06:01:00      10
2021-11-04 06:02:00      12
2021-11-04 06:03:00      14
2021-11-04 06:04:00      16
2021-11-04 06:05:00      18
2021-11-04 06:06:00      20
2021-11-04 06:07:00      22

2) Another possibility is to grep the times that end in 30 and then adjust each time 30 seconds downwards.

xx <- x[grep("30$", time(x))]
time(xx) <- time(xx) - 30

If you want the values corresponding to :00 instead then use this:

x[grep("00$", time(x))]

3) Another approach is to use aggregate.zoo and then convert back to xts. Use head instead of tail if you want the :00 values instead or use mean instead of head or tail and omit the 1 if you want to take the average value per minute period.

as.xts(aggregate(x, as.POSIXct(sub("30$", "00", time(x))), tail, 1))

4) We could take every other element and then adjust the time 30 seconds downwards as before.

xx <- x[rep(c(FALSE, TRUE), length = nrow(x))]
time(xx) <- time(xx) - 30

If you want the :00 times instead swap TRUE and FALSE and omit the adjustment of times.

x[rep(c(TRUE, FALSE), length = nrow(x))]

Note

library(xts)
x <- structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 
16, 17, 18, 19, 20, 21, 22), .Dim = c(22L, 1L), index = structure(c(1636019820, 
1636019850, 1636019880, 1636019910, 1636019940, 1636019970, 1636020000, 
1636020030, 1636020060, 1636020090, 1636020120, 1636020150, 1636020180, 
1636020210, 1636020240, 1636020270, 1636020300, 1636020330, 1636020360, 
1636020390, 1636020420, 1636020450), tzone = "", tclass = c("POSIXct", 
"POSIXt")), class = c("xts", "zoo"))

> x
                    [,1]
2021-11-04 05:57:00    1
2021-11-04 05:57:30    2
2021-11-04 05:58:00    3
2021-11-04 05:58:30    4
2021-11-04 05:59:00    5
2021-11-04 05:59:30    6
2021-11-04 06:00:00    7
2021-11-04 06:00:30    8
2021-11-04 06:01:00    9
2021-11-04 06:01:30   10
2021-11-04 06:02:00   11
2021-11-04 06:02:30   12
2021-11-04 06:03:00   13
2021-11-04 06:03:30   14
2021-11-04 06:04:00   15
2021-11-04 06:04:30   16
2021-11-04 06:05:00   17
2021-11-04 06:05:30   18
2021-11-04 06:06:00   19
2021-11-04 06:06:30   20
2021-11-04 06:07:00   21
2021-11-04 06:07:30   22