Home > front end >  how can we remove the rows from xts based on the seconds criteria
how can we remove the rows from xts based on the seconds criteria

Time:01-05

I have a problem. i want to delete some rows from the large data set. The problem is I have data of every 30 sec but I only want to go up to per minute. So, I want to remove the rows which have 30 sec init. for better understanding, I am attaching an example with the expected outcome that I want.

time                        value
2021-11-04 05:57:00         0.0
2021-11-04 05:57:30         0.0
2021-11-04 05:58:00         0.0
2021-11-04 05:58:30         0.0
2021-11-04 05:59:00         0.0
2021-11-04 05:59:30         0.0
2021-11-04 06:00:00         0.0
2021-11-04 06:00:30         0.0
2021-11-04 06:01:00         0.0
2021-11-04 06:01:30         0.0
2021-11-04 06:02:00         0.0
2021-11-04 06:02:30         0.0
2021-11-04 06:03:00         0.0
2021-11-04 06:03:30         0.0
2021-11-04 06:04:00         0.0
2021-11-04 06:04:30         0.0
2021-11-04 06:05:00         0.0
2021-11-04 06:05:30         0.0
2021-11-04 06:06:00         0.0
2021-11-04 06:06:30         0.0
2021-11-04 06:07:00         0.0
2021-11-04 06:07:30         0.0

what I want to be like that

time                        value
2021-11-04 05:57:00         0.0

2021-11-04 05:58:00         0.0

2021-11-04 05:59:00         0.0

2021-11-04 06:00:00         0.0

2021-11-04 06:01:00         0.0

2021-11-04 06:02:00         0.0

2021-11-04 06:03:00         0.0

2021-11-04 06:04:00         0.0

2021-11-04 06:05:00         0.0

2021-11-04 06:06:00         0.0

2021-11-04 06:07:00         0.0

every row with 30 sec init should be deleted from the data set.

CodePudding user response:

You can first truncate the time and then remove duplicates. Since the 30 second elements are the non-unique elements, they get removed:

library(xts)
xts3 <- xts(x=rnorm(10), order.by=as.POSIXct(strptime("2021-11-04 05:57:00", "%Y-%m-%d %H:%M:%S") 1:10*30), born=as.POSIXct("1899-05-08"))

# Round observations in z to the next hour
index(xts3) <- as.POSIXct(trunc(index(xts3), units="mins"))

# Remove duplicate times in z
xts3_dup <- make.index.unique(xts3, drop = TRUE)

xts
2021-11-04 05:57:00 -0.19766541
2021-11-04 05:58:00 -0.00902353
2021-11-04 05:58:00 -2.56173420
2021-11-04 05:59:00  0.64355622
2021-11-04 05:59:00 -0.18794658
2021-11-04 06:00:00  0.03005718
2021-11-04 06:00:00  0.64367384
2021-11-04 06:01:00  0.74716446
2021-11-04 06:01:00 -0.29986731
2021-11-04 06:02:00 -0.57503711

> xts3_dup
                           [,1]
2021-11-04 05:57:00 -0.19766541
2021-11-04 05:58:00 -0.00902353
2021-11-04 05:59:00  0.64355622
2021-11-04 06:00:00  0.03005718
2021-11-04 06:01:00  0.74716446
2021-11-04 06:02:00 -0.57503711

CodePudding user response:

Assuming input x shown in the Note at the end,

1) use to.minutes like this. Use 1 instead of 4 if you want to keep the :00 value instead of the :30 value.

to.minutes(x, indexAt = "startof")[, 4]

giving:

                    x.Close
2021-11-04 05:57:00       2
2021-11-04 05:58:00       4
2021-11-04 05:59:00       6
2021-11-04 06:00:00       8
2021-11-04 06:01:00      10
2021-11-04 06:02:00      12
2021-11-04 06:03:00      14
2021-11-04 06:04:00      16
2021-11-04 06:05:00      18
2021-11-04 06:06:00      20
2021-11-04 06:07:00      22

2) Another possibility is to grep the times that end in 30 and then adjust each time 30 seconds downwards.

xx <- x[grep("30$", time(x))]
time(xx) <- time(xx) - 30

If you want the values corresponding to :00 instead then use this:

x[grep("00$", time(x))]

3) Another approach is to use aggregate.zoo and then convert back to xts. Use head instead of tail if you want the :00 values instead or use mean instead of head or tail and omit the 1 if you want to take the average value per minute period.

as.xts(aggregate(x, as.POSIXct(sub("30$", "00", time(x))), tail, 1))

4) We could take every other element and then adjust the time 30 seconds downwards as before.

xx <- x[rep(c(FALSE, TRUE), length = nrow(x))]
time(xx) <- time(xx) - 30

If you want the :00 times instead swap TRUE and FALSE and omit the adjustment of times.

x[rep(c(TRUE, FALSE), length = nrow(x))]

Note

library(xts)
x <- structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 
16, 17, 18, 19, 20, 21, 22), .Dim = c(22L, 1L), index = structure(c(1636019820, 
1636019850, 1636019880, 1636019910, 1636019940, 1636019970, 1636020000, 
1636020030, 1636020060, 1636020090, 1636020120, 1636020150, 1636020180, 
1636020210, 1636020240, 1636020270, 1636020300, 1636020330, 1636020360, 
1636020390, 1636020420, 1636020450), tzone = "", tclass = c("POSIXct", 
"POSIXt")), class = c("xts", "zoo"))

> x
                    [,1]
2021-11-04 05:57:00    1
2021-11-04 05:57:30    2
2021-11-04 05:58:00    3
2021-11-04 05:58:30    4
2021-11-04 05:59:00    5
2021-11-04 05:59:30    6
2021-11-04 06:00:00    7
2021-11-04 06:00:30    8
2021-11-04 06:01:00    9
2021-11-04 06:01:30   10
2021-11-04 06:02:00   11
2021-11-04 06:02:30   12
2021-11-04 06:03:00   13
2021-11-04 06:03:30   14
2021-11-04 06:04:00   15
2021-11-04 06:04:30   16
2021-11-04 06:05:00   17
2021-11-04 06:05:30   18
2021-11-04 06:06:00   19
2021-11-04 06:06:30   20
2021-11-04 06:07:00   21
2021-11-04 06:07:30   22
  •  Tags:  
  • Related