I have referred to Create a presence/absence column based on presence records [duplicate], Create a presence-absence matrix with presence on specific dates, and Presence-absence matrix but continually run into the issue of my species-association columns.
Using a large, longitudinal dataset based on primate behavior, I have created a species/association table. I have an extra column, variable, likely due to my attempt at grouping "community_id". All reproducible subset of my dataset is seen below.
dput data -
data <- structure(list(Species = c("BABO", "BW", "RC", "BW", "RC", "SKS",
"SKS", "RC", "RC", "SKS", "BW", "RC", "RC", "RC", "RC", "SKS",
"RC", "SKS", "SKS", "RC"), Association = c(NA, "SKS", NA, "RC",
"BW", "SKS", NA, NA, NA, "BW", "SKS", NA, "SKS", "BW", "SKS",
NA, NA, "SKS", NA, "MANG"), variable = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = "community_id", class = "factor"), community_id = c("2007-4-16.C3",
"2007-4-16.C3", "2007-4-16.C3", "2007-4-17.Mwani", "2007-4-17.Mwani",
"2007-4-17.Mwani", "2007-4-17.Mwani", "2007-4-18.Sanje", "2007-4-18.Sanje",
"2007-4-18.Sanje", "2007-4-18.Sanje", "2007-5-8.C3", "2007-5-9.Mwani",
"2007-5-9.Mwani", "2007-5-9.Mwani", "2007-5-10.Sanje", "2007-5-10.Sanje",
"2007-6-6.C3", "2007-6-6.C3", "2007-6-6.C3")), row.names = c(NA,
20L), class = "data.frame")
Output -
Species Association variable community_id
<chr> <chr> <chr> <chr>
1 BABO NA community_id 2007-4-16.C3
2 BW SKS community_id 2007-4-16.C3
3 RC NA community_id 2007-4-16.C3
4 BW RC community_id 2007-4-17.Mwani
5 RC BW community_id 2007-4-17.Mwani
6 SKS SKS community_id 2007-4-17.Mwani
7 SKS NA community_id 2007-4-17.Mwani
8 RC NA community_id 2007-4-18.Sanje
9 RC NA community_id 2007-4-18.Sanje
10 SKS BW community_id 2007-4-18.Sanje
11 BW SKS community_id 2007-4-18.Sanje
12 RC NA community_id 2007-5-8.C3
13 RC SKS community_id 2007-5-9.Mwani
14 RC BW community_id 2007-5-9.Mwani
15 RC SKS community_id 2007-5-9.Mwani
16 SKS NA community_id 2007-5-10.Sanje
17 RC NA community_id 2007-5-10.Sanje
18 SKS SKS community_id 2007-6-6.C3
19 SKS NA community_id 2007-6-6.C3
20 RC MANG community_id 2007-6-6.C3
I need a presence-absense matrix grouped by the community_id. I tried to group by the column "community_id" already, which is where I believe I created the additional and seemingly irrelevant "variable" column. I am looking for the below output -
community_id BABO BW RC SKS Mang
<chr> <chr> <chr> <chr> <chr> <chr>
2007-4-16.C3 1 1 1 1 0
2007-4-17.Mwani 0 1 1 1 0
2007-4-18.Sanje 0 1 1 1 0
2007-5-8.C3 0 0 1 0 0
2007-5-9.Mwani 0 1 1 1 0
2007-5-10.Sanje 0 0 1 1 0
2007-6-6.C3 0 0 1 1 1
Any advice or assistance is greatly appreciated! Have a lovely day.
CodePudding user response:
You could do:
library(dplyr)
data %>%
group_by(community_id) %>%
summarize(as_tibble(t(sapply(c("BABO", "BW", "RC", "SKS", "Mang"),
function(x) as.numeric(x %in% Species)))))
#> # A tibble: 7 x 6
#> community_id BABO BW RC SKS Mang
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2007-4-16.C3 1 1 1 0 0
#> 2 2007-4-17.Mwani 0 1 1 1 0
#> 3 2007-4-18.Sanje 0 1 1 1 0
#> 4 2007-5-10.Sanje 0 0 1 1 0
#> 5 2007-5-8.C3 0 0 1 0 0
#> 6 2007-5-9.Mwani 0 0 1 0 0
#> 7 2007-6-6.C3 0 0 1 1 0
Created on 2023-01-19 with reprex v2.0.2
CodePudding user response:
A solution using pivot_wider
, combining variables Species and Association first.
library(tidyr)
library(dplyr)
rbind(as.matrix(data[, -2]), as.matrix(data[, -1])) %>%
as_tibble() %>%
distinct() %>%
na.omit() %>%
pivot_wider(community_id, names_from=Species, values_from=Species,
values_fn=function(x) any(unique(x) == x) * 1, values_fill = 0)
# A tibble: 7 × 6
community_id BABO BW RC SKS MANG
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2007-4-16.C3 1 1 1 1 0
2 2007-4-17.Mwani 0 1 1 1 0
3 2007-4-18.Sanje 0 1 1 1 0
4 2007-5-8.C3 0 0 1 0 0
5 2007-5-9.Mwani 0 1 1 1 0
6 2007-5-10.Sanje 0 0 1 1 0
7 2007-6-6.C3 0 0 1 1 1
CodePudding user response:
Basically you want to look if the complete species spc
are unique
ly %in%
the communities.
ufun <- \(x) unique(na.omit(unlist(x))) ## helper function
# spc <- ufun(data[1:2]) ## this might work on your complete data
spc <- c('BABO', 'BW', 'RC', 'SKS', 'Mang') ## here hard coded
by(data[1:2], data$community_id, \(x) setNames( (spc %in% ufun(x)), spc)) |>
do.call(what=rbind)
# BABO BW RC SKS Mang
# 2007-4-16.C3 1 1 1 1 0
# 2007-4-17.Mwani 0 1 1 1 0
# 2007-4-18.Sanje 0 1 1 1 0
# 2007-5-10.Sanje 0 0 1 1 0
# 2007-5-8.C3 0 0 1 0 0
# 2007-5-9.Mwani 0 1 1 1 0
# 2007-6-6.C3 0 0 1 1 0
CodePudding user response:
Base R solution
Make a table()
of unique()
combinations of community_id
and Species
:
table(unique(data[c("community_id", "Species")]))
Species
community_id BABO BW RC SKS
2007-4-16.C3 1 1 1 0
2007-4-17.Mwani 0 1 1 1
2007-4-18.Sanje 0 1 1 1
2007-5-10.Sanje 0 0 1 1
2007-5-8.C3 0 0 1 0
2007-5-9.Mwani 0 0 1 0
2007-6-6.C3 0 0 1 1
tidyverse solution
First subset to distinct()
values of community_id
and Species
; then create a Present
variable set to 1 for all observations; then pivot_longer()
, using the values_fill
arg to add 0s for community-species combinations that aren’t observed.
library(dplyr)
library(tidyr)
data %>%
distinct(community_id, Species) %>%
mutate(Present = 1L) %>%
pivot_wider(
names_from = Species,
values_from = Present,
values_fill = 0L
)
# A tibble: 7 × 5
community_id BABO BW RC SKS
<chr> <int> <int> <int> <int>
1 2007-4-16.C3 1 1 1 0
2 2007-4-17.Mwani 0 1 1 1
3 2007-4-18.Sanje 0 1 1 1
4 2007-5-8.C3 0 0 1 0
5 2007-5-9.Mwani 0 0 1 0
6 2007-5-10.Sanje 0 0 1 1
7 2007-6-6.C3 0 0 1 1