I have a dataframe containing observations of scallop presence/absence across multiple sites. I would like to count the number of scallops per site, using the UID (unique identifier) and the presence/absence column (binary: 0 is absent, 1 is present).
My dataframe looks like this:
UID | Present.Absent | Size.cm | binary |
---|---|---|---|
A-10-2021 | Present | 4.60 | 1 |
A-10-2021 | Present | 6.0 | 1 |
A-11-2021 | Present | 4.70 | 1 |
A-11-2021 | Present | 4.8 | 1 |
A-4-2021 | Absent | NA | 0 |
A-5-2021 | Present | 5.90 | 1 |
A-5-2021 | Present | 6.00 | 1 |
A-5-2021 | Present | 6.00 | 1 |
A-5-2021 | Present | 3.90 | 1 |
A-5-2021 | Present | 5.00 | 1 |
A-6-2021 | Absent | NA | 0 |
and it goes on for about ~6000 observations, with about 1500 different UIDs
I am new to R, and wasn't sure how to go about this. Is there a way to have it so there's one row per UID, with a column of abundance data? Any help is much appreciated, and if any additional information would help, I am happy to provide. Thank you!
Edit: added sample of data ; first 10 rows
structure(list(UID = c("A-10-2021", "A-10-2021", "A-11-2021",
"A-11-2021", "A-1-2021", "A-1-2021", "A-1-2021", "A-12-2021",
"A-12-2021", "A-12-2021"), Present.Absent = c("Present", "Present",
"Present", "Present", "Present", "Present", "Present", "Present",
"Present", "Present"), Alive.Dead = c("Alive", "Alive", "Alive",
"Alive", "Alive", "Alive", "Alive", "Alive", "Alive", "Alive"
), Size.cm = c(4.6, 5.25, 4.7, 5.1, 3.5, 3.9, 4.7, 4.7, 4.9,
4.9), binary = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(3L,
4L, 9L, 10L, 14L, 15L, 17L, 36L, 37L, 38L), class = "data.frame")
CodePudding user response:
You can use group_by() to achieve that:
# Your data
temp1 <- structure(list(UID = c("A-10-2021", "A-10-2021", "A-11-2021",
"A-11-2021", "A-1-2021", "A-1-2021", "A-1-2021", "A-12-2021",
"A-12-2021", "A-12-2021"), Present.Absent = c("Present", "Present",
"Present", "Present", "Present", "Present", "Present", "Present",
"Present", "Present"), Alive.Dead = c("Alive", "Alive", "Alive",
"Alive", "Alive", "Alive", "Alive", "Alive", "Alive", "Alive"
), Size.cm = c(4.6, 5.25, 4.7, 5.1, 3.5, 3.9, 4.7, 4.7, 4.9,
4.9), id = c(3L, 4L, 9L, 10L, 14L, 15L, 17L, 36L, 37L, 38L)), row.names = c(3L,
4L, 9L, 10L, 14L, 15L, 17L, 36L, 37L, 38L), class = "data.frame")
Note that you can first create your binary column (isPresent) by using mutate() and ifelse().
library(tidyverse)
# Option 1: Create a new column with abundance, by UID, but keep the number of rows
temp1 %>% mutate(isPresent = ifelse(Present.Absent == "Present", 1, 0)) %>% group_by(UID) %>% mutate(abundance = sum(isPresent))
# Option 2: Get a summary, with one row per UID
temp1 %>% mutate(isPresent = ifelse(Present.Absent == "Present", 1, 0)) %>% group_by(UID) %>% summarise(abundance = sum(isPresent))