Home > Blockchain >  Referring to the row above when using mutate() in R
Referring to the row above when using mutate() in R

Time:04-05

I want to create a new variable in a dataframe that refers to the value of the same new variable in the row above. Here's an example of what I want to do:

A horse is in a field divided into four zones. The horse is wearing a beacon that signals every minute, and the signal is picked up by one of four sensors, one for each zone. The field has a fence that runs most of the way down the middle, such that the horse can pass easily between zones 2 and 3, but to get between zones 1 and 4 it has to go via 2 and 3. The horse cannot jump over the fence.

          |________________|
          |                |       
sensor 2  | X      |       | sensor 3
          |        |       |
          |        |       |
          |        |       |
sensor 1  |       Y|       | sensor 4
          |        |       |
          |----------------| 

In the schematic above, if the horse is at position X, it will be picked up by sensor 2. If the horse is near the middle fence at position Y, however, it may be picked up by either sensor 1 or sensor 4, the ranges of which overlap slightly.

In the toy example below, I have a dataframe where I have location data each minute for 20 minutes. In most cases, the horse moves one zone at a time, but in several instances, it switches back and forth between zone 1 and 4. This should be impossible: the horse cannot jump the fence, and neither can it run around in the space of a minute.

I therefore want to calculate a new variable in the dataset that provides the "true" location of the animal, accounting for the impossibility of travelling between 1 and 4.

Here's the data:

library(tidyverse)
library(reshape2)

example <- data.frame(time = seq(as.POSIXct("2022-01-01 09:00:00"), 
                              as.POSIXct("2022-01-01 09:20:00"),
                              by="1 mins"),
                   location = c(1,1,1,1,2,3,3,3,4,4,4,3,3,2,1,1,4,1,4,1,4))
example

Create two new variables: "prevloc" is where the animal was in the previous minute, and "diffloc" is the number differences between the animal's current and previous location.

example <- example %>% mutate(prevloc = lag(location),
                        diffloc = abs(location - prevloc))
example

Next, just change the first value of "diffloc" from NA to zero:

example <- example %>% mutate(diffloc = ifelse(is.na(diffloc), 0, diffloc))
example

Now we have a dataframe where diffloc is either 0 (animal didn't move), 1 (animal moved one zone), or 3 (animal apparently moved from zone 1 to zone 4 or vice versa). Where diffloc = 3, I want to create a "true" location taking account of the fact that such a change in location is impossible.

In my example, the animal went from zone 1 -> 4 -> 1 -> 4 -> 1 -> 4. Based on the fact that the animal started in zone 1, my assumption is that the animal just stayed in zone 1 the whole time.

My attempt to solve this below, which doesn't work:

example <- example %>% 
  mutate(returnloc = ifelse(diffloc < 3, location, lag(returnloc)))

I wonder whether anyone can help me to solve this? I've been trying for a couple of days and haven't even got close...

Best wishes,

Adam

CodePudding user response:

One possible solution is to, when diffloc == 3, look at the previous value that is not 1 nor 4. If it is 2, then the horse is certainly in 1 afterwards, if it is 3, then the horse is certainly in 4.

example %>% 
  mutate(trueloc = case_when(diffloc == 3 & sapply(seq(row_number()), \(i) tail(location[1:i][!location %in% c(1, 4)], 1) == 2) ~ 1,
                             diffloc == 3 & sapply(seq(row_number()), \(i) tail(location[1:i][!location %in% c(1, 4)], 1) == 3) ~ 4,
                             T ~ location))


                  time location prevloc diffloc trueloc
1  2022-01-01 09:00:00        1      NA       0       1
2  2022-01-01 09:01:00        1       1       0       1
3  2022-01-01 09:02:00        1       1       0       1
4  2022-01-01 09:03:00        1       1       0       1
5  2022-01-01 09:04:00        2       1       1       2
6  2022-01-01 09:05:00        3       2       1       3
7  2022-01-01 09:06:00        3       3       0       3
8  2022-01-01 09:07:00        3       3       0       3
9  2022-01-01 09:08:00        4       3       1       4
10 2022-01-01 09:09:00        4       4       0       4
11 2022-01-01 09:10:00        4       4       0       4
12 2022-01-01 09:11:00        3       4       1       3
13 2022-01-01 09:12:00        3       3       0       3
14 2022-01-01 09:13:00        2       3       1       2
15 2022-01-01 09:14:00        1       2       1       1
16 2022-01-01 09:15:00        1       1       0       1
17 2022-01-01 09:16:00        4       1       3       1
18 2022-01-01 09:17:00        1       4       3       1
19 2022-01-01 09:18:00        4       1       3       1
20 2022-01-01 09:19:00        1       4       3       1
21 2022-01-01 09:20:00        4       1       3       1

CodePudding user response:

Here is an approach using a funciton containing a for-loop. You cannot rely on diff, because this will not pick up sequences of (wrong) zone 4's.
c(1,1,4,4,4,1,1,1) should be converted to c(1,1,1,1,1,1,1,1) if I understand your question correctly.

So, you need to iterate (I think).

library(data.table)
# custom sample data set
example <- data.frame(time = seq(as.POSIXct("2022-01-01 09:00:00"), 
                                 as.POSIXct("2022-01-01 09:20:00"),
                                 by="1 mins"),
                      location = c(1,1,1,1,2,3,3,3,4,4,4,3,3,2,1,1,4,4,4,1,4))
# Make it a data.table, make sure the time is ordered
setDT(example, key = "time")
# function
fixLocations <- function(x) {
  for(i in 2:length(x)) {
    if (abs(x[i] - x[i-1]) > 1) x[i] <- x[i-1]
  }
  return(x)
}

NB that this function only works if the location in the first row is correct. If it start with (wrong) zone 4's, it will go awry.

example[, locationNew := fixLocations(location)][]

#                   time location locationNew
# 1: 2022-01-01 09:00:00        1           1
# 2: 2022-01-01 09:01:00        1           1
# 3: 2022-01-01 09:02:00        1           1
# 4: 2022-01-01 09:03:00        1           1
# 5: 2022-01-01 09:04:00        2           2
# 6: 2022-01-01 09:05:00        3           3
# 7: 2022-01-01 09:06:00        3           3
# 8: 2022-01-01 09:07:00        3           3
# 9: 2022-01-01 09:08:00        4           4
#10: 2022-01-01 09:09:00        4           4
#11: 2022-01-01 09:10:00        4           4
#12: 2022-01-01 09:11:00        3           3
#13: 2022-01-01 09:12:00        3           3
#14: 2022-01-01 09:13:00        2           2
#15: 2022-01-01 09:14:00        1           1
#16: 2022-01-01 09:15:00        1           1
#17: 2022-01-01 09:16:00        4           1
#18: 2022-01-01 09:17:00        4           1
#19: 2022-01-01 09:18:00        4           1
#20: 2022-01-01 09:19:00        1           1
#21: 2022-01-01 09:20:00        4           1
#                   time location locationNew
  • Related