How to merge row data for two different columns in dplyr R?-CodePudding

site <- c(1,1,2,2,3,3,4,4)
rep <- c(1,2,1,2,1,2,1,2)
sp.1 <- c(NA,1,NA,4,NA,6,7,NA)
sp.2 <-  c(2,NA,1,NA,5,6,7,8)
df.dummy <- data.frame(site, rep, sp.1, sp.2)

  site rep sp.1 sp.2
1    1   1   NA    2
2    1   2    1   NA
3    2   1   NA    1
4    2   2    4   NA
5    3   1   NA    5
6    3   2    6    6
7    4   1    7    7
8    4   2   NA    8

In my dataset, I want to do things: In identical sites, but different reps which has NA for sp.1 in one row and NA for sp.2 in the other or vice-versa (e.g. first two rows in this data frame), then merge the column.

So the rows should like something like

 site rep sp.1 sp.2
1    1   1    1    2
2    1   2    1   NA ---> get rid of this row
3    2   1   4    1
4    2   2    4   NA ---> get rid of this row
5    3   1   NA    5 
6    3   2    6    6
7    4   1    7    7
8    4   2   NA    8

If, say, sp.2 has two data-points for identical site and rep (e.g. 5 and 6) then take the mean of the two points and proceed as in (1)

In the end, I need only 4 rows but but all filled with sp.1 and sp.2

 site rep sp.1 sp.2
    1   1    1    2
    2   1    4    1
    3   2    6    5.5
    4   1    7    7.5

Edit: I have added the outcomes

CodePudding user response：

Probably, we need

library(dplyr)
df.dummy %>% 
   group_by(site) %>% 
   mutate(across(starts_with('sp'), 
          ~.[order(is.na(.))])) %>% 
   filter(!if_all(starts_with('sp'),  is.na)) %>% 
   summarise(rep= first(rep), across(starts_with('sp'), 
         mean, na.rm = TRUE))

-output

# A tibble: 4 × 4
   site   rep  sp.1  sp.2
  <dbl> <dbl> <dbl> <dbl>
1     1     1     1   2  
2     2     1     4   1  
3     3     1     6   5.5
4     4     1     7   7.5

CodePudding user response：

EDIT based on clarification:

library(tidyverse)
df.dummy %>%
  pivot_longer(sp.1:sp.2) %>%
  group_by(site, name) %>%
  summarize(value = mean(value, na.rm = TRUE), .groups = "drop") %>%
  pivot_wider(names_from = name, values_from = value)

Or more simply:

df.dummy %>%
  group_by(site) %>%
  summarize(across(sp.1:sp.2, ~mean(.x, na.rm = TRUE)))

# A tibble: 4 x 3
   site  sp.1  sp.2
  <dbl> <dbl> <dbl>
1     1     1   2  
2     2     4   1  
3     3     6   5.5
4     4     7   7.5

CodePudding user response：

Update: The shorter version with keeping rep column:

df.dummy %>%
  group_by(site, first(rep)) %>%
  summarize(across(sp.1:sp.2, ~mean(.x, na.rm = TRUE)))

first answer: We could do it this way: grouping by first(rep):

library(dplyr)
df.dummy %>% 
  group_by(site, first(rep)) %>% 
  summarise(sp.1=mean(sp.1, na.rm = TRUE), sp.2=mean(sp.2, na.rm = TRUE))

# Groups:   site [4]
   site `first(rep)`  sp.1  sp.2
  <dbl>        <dbl> <dbl> <dbl>
1     1            1     1   2  
2     2            1     4   1  
3     3            1     6   5.5
4     4            1     7   7.5