Home > OS >  R - How to merge several rows into one by summing the values while keeping the columns intact
R - How to merge several rows into one by summing the values while keeping the columns intact

Time:10-11

I've been struggling on R for the last two days with a simple basic operation that would take 1 min to perform in Excel. Let me explain: I have a dataframe with 65 columns (variables, the species found) and 75 rows (obs, the quadrats). The first 25 rows are quadrats of site A, then the next 25 from site B then the last 25 from site C. Each column is a species and if the species X is present in Site A1, we have a 1 and if it's not there is a zero. I would like to study the diversity index of each site and not of each quadrat which is what specnumber() and diversity() will give me. Here is an exemple of the dataframe with 5 quadrats from site A and 3 species:

Site Quadrat Sp X. Sp Y. Sp Z.
A Quadrat A1 1 1 0
A Quadrat A2 0 1 0
A Quadrat A3 1 1 0
A Quadrat A4 0 0 1
A Quadrat A5 1 1 1

And here is what I would like to get:

Site Sp X. Sp Y. Sp Z.
Site A 3 4 2

I have tried sumRows and sum of course, as well as rbind and dozens of other things found online but nothing seems to work, I am really new to R:

new <- rbind(df, L = sum(df[1:25,]) )

new <- rowSums(df[1:25,]

Thank you so much

CodePudding user response:

Some example data.

df <- structure(list(Sites = c("Quadrat A1", "Quadrat A2", "Quadrat A3", 
"Quadrat A4", "Quadrat A5"), `Sp X.` = c(0L, 1L, 1L, 0L, 1L), 
    `Sp Y.` = c(0L, 1L, 1L, 0L, 1L), `Sp Z.` = c(0L, 0L, 0L, 
    1L, 1L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-5L))

Tidyverse solution. It should work with any number of sites.

library("dplyr")
library("stringr")

df |>
  mutate(Sites=str_glue("Site {x}", x=str_extract(Sites, "(?<=\\s)[[:alpha:]] "))) |> 
  group_by(Sites) |>
  summarize(across(where(is.numeric), sum), .groups="drop")

Result.

# A tibble: 1 × 4
  Sites  `Sp X.` `Sp Y.` `Sp Z.`
  <glue>   <int>   <int>   <int>
1 Site A       3       3       2

CodePudding user response:

You might reformat your data by adding a column for Site. You can write code to pull out the "A" and "B" from Quadrant, but that is making it harder than it needs to be.

df <- data.frame("Site" = c("A", "A", "A", "B", "B", "C"), "SpecA" = c(1,0,1,1,1,0), "SpecB" = c(0,1,0,1,1,1))

Then do something like this:

siteA <- sum(df$SpecA[df$Site == "A"])

CodePudding user response:

If You know how many rows each squares occupy, you can use colSums for these rows and make a data frame out of it.

For exapmle data:

       Sites Sp_x Sp_y Sp_z
1 Quadrat A1    1    1    0
2 Quadrat A2    0    1    0
3 Quadrat A3    1    1    0
4 Quadrat B1    0    0    1
5 Quadrat B2    1    1    1
6 Quadrat B3    1    1    0
7 Quadrat C1    1    1    0
8 Quadrat C2    0    0    1
9 Quadrat C3    0    0    0
Quadrat_A <- colSums(data[1:3,-1])
Quadrat_B <- colSums(data[4:6,-1])
Quadrat_C <- colSums(data[7:9,-1])

data_merged <- t(data.frame(Quadrat_A,Quadrat_B,Quadrat_C, stringsAsFactors=FALSE))

Output:

          Sp_x Sp_y Sp_z
Quadrat_A    2    3    0
Quadrat_B    2    2    2
Quadrat_C    1    1    1
  •  Tags:  
  • rsum
  • Related