I have data on bird individuals and their feeding locations. The feeding locations move, and so I want to create a variable that calculates the distance from yesterday's feeding location to "today's" feeding options.
Here is a reprex that exemplifies what I'm talking about. So, the 'bird' column represents the bird individual id's, feedLoc represents the the possible feeding locations for each day. Then there is the date of that observation. H (horizontal) and V (vertical) represent coordinate locations of the feeding locations on a grid. And finally, bp represents if that individual was identified at the feeding location or not.
reprex <- tibble(bird = c("A", "A", "A", "B", "B", "B", "C", "C"),
feedLoc = c("x","y", "x", "x", "y", "x", "y", "z"),
date = as.Date(c("2020-05-10", "2020-05-11", "2020-05-11",
"2020-05-24", "2020-05-25", "2020-05-25",
"2020-05-22", "2020-05-23")),
h = c(100, 123, 45, 75, 89, 64, 99, 101),
v = c(89, 23, 65, 92, 29, 90, 120, 34),
bp = c(1, 1, 0, 1, 0, 1, 1, 0))
Which produces this:
# A tibble: 8 × 6
bird feedLoc date h v bp
<chr> <chr> <date> <dbl> <dbl> <dbl>
1 A x 2020-05-10 100 89 1
2 A y 2020-05-11 123 23 1
3 A x 2020-05-11 45 65 0
4 B x 2020-05-24 75 92 1
5 B y 2020-05-25 89 29 0
6 B x 2020-05-25 64 90 1
7 C y 2020-05-22 99 120 1
8 C z 2020-05-23 101 34 0
My question is, I want to make a new variable that calculates the distance from yesterday's feeding choice (so, the rows where bp == 1 AND date == date - 1), to the current feeding location options for each bird individual using the coordinate data. How would I do this? Thanks!
I initially tried to group by bird and feedLoc id's, arrange by date, and then lag the h and v variables so that I could then use the distance formula to calculate distance from yesterday's ant swarm choice. However, the issue with that is that in the data set, the row previous when arranged is not always exactly "yesterday".
CodePudding user response:
Try something like this dplyr
approach, which first restricts the manipulation to just bp == 1
then checks to see if the feeding location is different and the previous date is one day behind (date == date - 1
) then calculates the difference for h
and y
. After all that it adds back in the bp == 0
data and rearranges (this approach saves a more convoluted case_when
statement. If this isn't exactly what you need post an example of the desired output and I will edit. Good luck!
library(dplyr)
reprex %>%
group_by(bird) %>%
filter(bp == 1) %>%
arrange(date) %>%
mutate(h_change = case_when(
feedLoc != lag(feedLoc) & lag(date) == date - 1 ~ h - lag(h)),
v_change = case_when(
feedLoc != lag(feedLoc) & lag(date) == date - 1 ~ v - lag(v)
)) %>%
right_join(reprex) %>% arrange(bird, date)
Output:
# bird feedLoc date h v bp h_change v_change
# <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A x 2020-05-10 100 89 1 NA NA
# 2 A y 2020-05-11 123 23 1 23 -66
# 3 A x 2020-05-11 45 65 0 NA NA
# 4 B x 2020-05-24 75 92 1 NA NA
# 5 B x 2020-05-25 64 90 1 NA NA
# 6 B y 2020-05-25 89 29 0 NA NA
# 7 C y 2020-05-22 99 120 1 NA NA
# 8 C z 2020-05-23 101 34 0 NA NA
CodePudding user response:
You can temporarily arrange()
by desc(bp)
, then compute distances conditional on bp == 1
:
library(dplyr)
reprex %>%
arrange(desc(bp)) %>%
group_by(bird) %>%
mutate(dist = ifelse(
bp == 1,
sqrt((h - lag(h))^2 (v - lag(v))^2),
NA
)) %>%
ungroup() %>%
arrange(bird, date)
# A tibble: 8 × 7
bird feedLoc date h v bp dist
<chr> <chr> <date> <dbl> <dbl> <dbl> <dbl>
1 A x 2020-05-10 100 89 1 NA
2 A y 2020-05-11 123 23 1 69.9
3 A x 2020-05-11 45 65 0 NA
4 B x 2020-05-24 75 92 1 NA
5 B x 2020-05-25 64 90 1 11.2
6 B y 2020-05-25 89 29 0 NA
7 C y 2020-05-22 99 120 1 NA
8 C z 2020-05-23 101 34 0 NA