I have two datasets: One of species in my study and how many times I observed them, and another larger dataset that is a broader database of observations.
I want to mutate a column in my short dataset for "lowest latitude observed" (or highest or mean) from values in the other dataset but I can't quite figure out how to match them in a mutate.
set.seed(1)
# my dataset. sightings isn't important for this, just important that the solution doesn't mess up existing columns.
fake_spp_df <- data.frame(
species = c("a","b","c","d",'e'),
sightings = c(5,1,2,6,3)
)
# broader occurrence dataset
fake_spp_occurrences <- data.frame(
species = rep(c("a","b","c","d",'f'),each=20), # notice spp "f" - not all species are the same between datasets
latitude = runif(100, min = 0, max = 80),
longitude = runif(100, min=-90, max = -55)
)
# so I know to find one species min, i could do this:
min(fake_spp_occurrences$latitude[fake_spp_occurrences$species == "a"]),
# but I want to do that in a mutate()
# this was my failed attempt:
fake_spp_df %>%
mutate(lowest_lat = min(fake_spp_occurrences$latitude[fake_spp_occurrences$species == species])
)
desired result:
> fake_spp_df
species sightings lowest_lat max_lat median_lat
1 a 5 1.7 etc...
2 b 1 5.3
3 c 2 2.2
4 d 6 4.3
5 e 3 NA
thinking this could also be done witth some kind of join or merge, but I'm not sure.
thanks!
CodePudding user response:
summarise
the fake_spp_occurrences
dataset and then perform the join.
library(dplyr)
fake_spp_occurrences %>%
group_by(species) %>%
summarise(lowest_lat = min(latitude),
max_lat = max(latitude),
median_lat = median(latitude)) %>%
right_join(fake_spp_df, by = 'species')
# species lowest_lat max_lat median_lat sightings
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 a 4.94 79.4 48.1 5
#2 b 1.07 74.8 35.7 1
#3 c 1.87 68.9 41.9 2
#4 d 6.74 76.8 38.2 6
#5 e NA NA NA 3