Home > Back-end >  mutate to merge minimum value from different df in R
mutate to merge minimum value from different df in R

Time:09-18

I have two datasets: One of species in my study and how many times I observed them, and another larger dataset that is a broader database of observations.

I want to mutate a column in my short dataset for "lowest latitude observed" (or highest or mean) from values in the other dataset but I can't quite figure out how to match them in a mutate.

set.seed(1)
# my dataset. sightings isn't important for this, just important that the solution doesn't mess up existing columns. 
fake_spp_df <- data.frame(
  species = c("a","b","c","d",'e'),
  sightings = c(5,1,2,6,3)
) 

# broader occurrence dataset
fake_spp_occurrences <- data.frame(
  species = rep(c("a","b","c","d",'f'),each=20), # notice spp "f" - not all species are the same between datasets
  latitude = runif(100, min = 0, max = 80),
  longitude = runif(100, min=-90, max = -55)
)


# so I know to find one species min, i could do this:
min(fake_spp_occurrences$latitude[fake_spp_occurrences$species == "a"]),
# but I want to do that in a mutate()

# this was my failed attempt:
fake_spp_df %>%
  mutate(lowest_lat = min(fake_spp_occurrences$latitude[fake_spp_occurrences$species == species])
)

desired result:

> fake_spp_df

  species sightings lowest_lat  max_lat  median_lat
1       a         5     1.7      etc...
2       b         1     5.3
3       c         2     2.2
4       d         6     4.3
5       e         3     NA

thinking this could also be done witth some kind of join or merge, but I'm not sure.

thanks!

CodePudding user response:

summarise the fake_spp_occurrences dataset and then perform the join.

library(dplyr)

fake_spp_occurrences %>%
  group_by(species) %>%
  summarise(lowest_lat = min(latitude), 
            max_lat = max(latitude), 
            median_lat = median(latitude)) %>%
  right_join(fake_spp_df, by = 'species')

#  species lowest_lat max_lat median_lat sightings
#  <chr>        <dbl>   <dbl>      <dbl>     <dbl>
#1 a             4.94    79.4       48.1         5
#2 b             1.07    74.8       35.7         1
#3 c             1.87    68.9       41.9         2
#4 d             6.74    76.8       38.2         6
#5 e            NA       NA         NA           3
  • Related