site <- c(1,1,2,2,3,3,4,4)
rep <- c(1,2,1,2,1,2,1,2)
sp.1 <- c(NA,1,NA,4,NA,6,7,NA)
sp.2 <- c(2,NA,1,NA,5,6,7,8)
df.dummy <- data.frame(site, rep, sp.1, sp.2)
site rep sp.1 sp.2
1 1 1 NA 2
2 1 2 1 NA
3 2 1 NA 1
4 2 2 4 NA
5 3 1 NA 5
6 3 2 6 6
7 4 1 7 7
8 4 2 NA 8
In my dataset, I want to do things: In identical sites, but different reps which has NA for sp.1 in one row and NA for sp.2 in the other or vice-versa (e.g. first two rows in this data frame), then merge the column.
So the rows should like something like
site rep sp.1 sp.2
1 1 1 1 2
2 1 2 1 NA ---> get rid of this row
3 2 1 4 1
4 2 2 4 NA ---> get rid of this row
5 3 1 NA 5
6 3 2 6 6
7 4 1 7 7
8 4 2 NA 8
If, say, sp.2 has two data-points for identical site and rep (e.g. 5 and 6) then take the mean of the two points and proceed as in (1)
In the end, I need only 4 rows but but all filled with sp.1 and sp.2
site rep sp.1 sp.2
1 1 1 2
2 1 4 1
3 2 6 5.5
4 1 7 7.5
Edit: I have added the outcomes
CodePudding user response:
Probably, we need
library(dplyr)
df.dummy %>%
group_by(site) %>%
mutate(across(starts_with('sp'),
~.[order(is.na(.))])) %>%
filter(!if_all(starts_with('sp'), is.na)) %>%
summarise(rep= first(rep), across(starts_with('sp'),
mean, na.rm = TRUE))
-output
# A tibble: 4 × 4
site rep sp.1 sp.2
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 2
2 2 1 4 1
3 3 1 6 5.5
4 4 1 7 7.5
CodePudding user response:
EDIT based on clarification:
library(tidyverse)
df.dummy %>%
pivot_longer(sp.1:sp.2) %>%
group_by(site, name) %>%
summarize(value = mean(value, na.rm = TRUE), .groups = "drop") %>%
pivot_wider(names_from = name, values_from = value)
Or more simply:
df.dummy %>%
group_by(site) %>%
summarize(across(sp.1:sp.2, ~mean(.x, na.rm = TRUE)))
# A tibble: 4 x 3
site sp.1 sp.2
<dbl> <dbl> <dbl>
1 1 1 2
2 2 4 1
3 3 6 5.5
4 4 7 7.5
CodePudding user response:
Update:
The shorter version with keeping rep
column:
df.dummy %>%
group_by(site, first(rep)) %>%
summarize(across(sp.1:sp.2, ~mean(.x, na.rm = TRUE)))
first answer:
We could do it this way: grouping by first(rep)
:
library(dplyr)
df.dummy %>%
group_by(site, first(rep)) %>%
summarise(sp.1=mean(sp.1, na.rm = TRUE), sp.2=mean(sp.2, na.rm = TRUE))
# Groups: site [4]
site `first(rep)` sp.1 sp.2
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 2
2 2 1 4 1
3 3 1 6 5.5
4 4 1 7 7.5