The data I am working with is from eBird, and I am looking to sort out species occurrence by both name and year. There are over 30k individual observations, each with its own number of birds. From the raw data I posted below, on Jan 1, 2021 and someone observed 2 Cooper's Hawks, etc.
Raw looks like this:
specificName indivualCount eventDate year
Cooper's Hawk 1 (1/1/2018) 2018
Cooper's Hawk 1 (1/1/2020) 2020
Cooper's Hawk 2 (1/1/2021) 2021
Ideally, I would be able to group all the Cooper's Hawks specificName
by the year
they were observed and sum the total invidualcounts
. That way I can make statistical comparisons between the number of birds observed in 2018, 2019, 2020, & 2021.
I created the separate column for the year
year <- as.POSIXct(ebird.df$eventDate, format = "%m/%d/%Y") ebird.df$year <- as.numeric(format(year, "%Y"))
Then aggregated with the follwing:
aggdata <- aggregate(ebird.df$individualCount , by = list( ebird.df$specificname, ebird.df$year ), FUN = sum)
There are hundreds of bird species, so Cooper's Hawks start on the 115th row so the output looks like this:
Group.1 Group.2 x
115 2018 Cooper's Hawk 86
116 2019 Cooper's Hawk 152
117 2020 Cooper's Hawk 221
118 2021 Cooper's Hawk 116
My question is how to I get the data to into a table that looks like the following:
Species Name 2018 2019 2020 2021
Cooper's Hawk 86 152 221 116
I want to eventually run some basic ecology stats on the data using vegan
, but one problem first I guess lol
Thanks!
CodePudding user response:
There are errors in the data and code in the question so we used the code and reproducible data given in the Note at the end.
Now, using xtabs we get an xtabs table directly from ebird.df like this. No packages are used.
xtabs(individualCount ~ specificName year, ebird.df)
## year
## specificName 2018 2020 2021
## Cooper's Hawk 1 1 2
Optionally convert it to a data.frame:
xtabs(individualCount ~ specificName year, ebird.df) |>
as.data.frame.matrix()
## 2018 2020 2021
## Cooper's Hawk 1 1 2
Although we did not need to use aggdata if you need it for some other reason then it can be computed using aggregate.formula like this:
aggregate(individualCount ~ specificName year, ebird.df, sum)
Note
Lines <- "specificName,individualCount,eventDate,year
\"Cooper's Hawk\",1,(1/1/2018),2018
\"Cooper's Hawk\",1,(1/1/2020),2020
\"Cooper's Hawk\",2,(1/1/2021),2021"
ebird.df <- read.csv(text = Lines, strip.white = TRUE)