Home > database >  Parsing 8-byte hex integers in R
Parsing 8-byte hex integers in R

Time:01-02

Context: I'm trying to write a parser in R for the track files exported by my preferred GPS app. The files use a custom binary specification, with latitude, longitude, and timestamps all represented as 8-byte, big-endian, signed integers. For example, latitude is degrees north x10^7. This is the first time I've messed around with parsing raw/hex representations.

Let's say I have 3 raw integers:

# Should parse as 377441228
lat = as.raw(c(0x00, 0x00, 0x00, 0x00, 0x16, 0x7f, 0x4b, 0xcc))
# Should parse as -1195899101
lon = as.raw(c(0xff, 0xff, 0xff, 0xff, 0xb8, 0xb8, 0x07, 0x23))
# Should parse as 1618678057000
time = as.raw(c(0x00, 0x00, 0x01, 0x78, 0xe0, 0xbb, 0x08, 0x28))

The first approach I found was to use readBin(). This works correctly for lat and lon but not time:

# 377441228: correct
readBin(lat, integer(), size = 8, 
        signed = TRUE, endian = 'big')
# -1195899101: correct
readBin(lon, integer(), size = 8, 
        signed = TRUE, endian = 'big')
# -524613592: incorrect
readBin(time, integer(), size = 8, 
        signed = TRUE, endian = 'big')

The next approach was to do some string wrangling and pass through as.numeric(). This worked for lat and time, but not lon:

library(magrittr)
parser = function(hex) {
    hex |> 
        paste(collapse = '') %>%
        paste0('0x', .) |> 
        as.numeric()
}
# 377441228: correct
parser(lat)
# 1.844674e 19: incorrect
parser(lon)
# 1.618678e 12: correct
parser(time)

How do I parse these?

CodePudding user response:

You can use this little function which uses only base R. It converts the raw data into bits, orders these into a single big-endian vector of 1s and 0s, then uses their two's complement representation to convert them to the appropriate value.

parser <- function(x) {
  bits <- sapply(x, function(y) rev(as.integer(rawToBits(y))))
  sum(bits[-1] * 2^(62:0)) - bits[1] * 2^63
}

Testing, we have:

lat  <- as.raw(c(0x00, 0x00, 0x00, 0x00, 0x16, 0x7f, 0x4b, 0xcc))
lon  <- as.raw(c(0xff, 0xff, 0xff, 0xff, 0xb8, 0xb8, 0x07, 0x23))
time <- as.raw(c(0x00, 0x00, 0x01, 0x78, 0xe0, 0xbb, 0x08, 0x28))

parser(lat)
#> [1] 377441228
parser(lon)
#> [1] -1195898880
parser(time)
#> [1] 1.618678e 12

If you prefer a vectorized version that will handle multiple values at once, you can do:

parser <- function(x) {
  sapply(x, function(z) {
    bits <- sapply(z, function(y) rev(as.integer(rawToBits(y))))
    sum(bits[-1] * 2^(62:0)) - bits[1] * 2^63
  })
}

parser(list(lat, lon, time))
#> [1]     377441228   -1195898880 1618678057000

Created on 2023-01-01 with reprex v2.0.2

  •  Tags:  
  • rhex
  • Related