I have a dataframe of coordinates for different studies that have been conducted. The studies are either experiment
or observation
however at some locations both experiment
AND observation
occur. For these sites, I would like to create a new study
category called both
. How can I do this using dplyr
?
Example Data
df1 <- data.frame(matrix(ncol = 4, nrow = 6))
colnames(df1)[1:4] <- c("value", "study", "lat","long")
df1$value <- c(1,1,2,3,4,4)
df1$study <- rep(c('experiment','observation'),3)
df1$lat <- c(37.541290,37.541290,38.936604,29.9511,51.509865,51.509865)
df1$long <- c(-77.434769,-77.434769,-119.986649,-90.0715,-0.118092,-0.118092)
df1
value study lat long
1 1 experiment 37.54129 -77.434769
2 1 observation 37.54129 -77.434769
3 2 experiment 38.93660 -119.986649
4 3 observation 29.95110 -90.071500
5 4 experiment 51.50986 -0.118092
6 4 observation 51.50986 -0.118092
Note that the value
above is duplicated when study
has experiment
AND observation
.
The ideal output would look like this
value study lat long
1 1 both 37.54129 -77.434769
2 2 experiment 38.93660 -119.986649
3 3 observation 29.95110 -90.071500
4 4 both 51.50986 -0.118092
CodePudding user response:
We can replace those 'value' cases where both experiment and observation is available to 'both' and get the distinct
library(dplyr)
df1 %>%
group_by(value) %>%
mutate(study = if(all(c("experiment", "observation") %in% study))
"both" else study) %>%
ungroup %>%
distinct
-output
# A tibble: 4 × 4
value study lat long
<dbl> <chr> <dbl> <dbl>
1 1 both 37.5 -77.4
2 2 experiment 38.9 -120.
3 3 observation 30.0 -90.1
4 4 both 51.5 -0.118