I have a list of species with details on temperature and other variables that need to be in a wide format to compute analysis. I need to have one row per date and site, and all other variables, like temperature, just the average of the date but respecting the site. Find a diagram of what I need below
I have found ways to do it, but I can not find a way to get the other variables, for example, the temperature or cloud cover averaged by day. I hope someone can help me
CodePudding user response:
where do the numbers come from in the columns starting with sp?
CodePudding user response:
Replace the values in the columns you want to average with the averages before pivoting. Your example data doesn’t illustrate the problem of having varying temps within a site/date, so I modified the data a bit:
library(tidyverse)
tbl <- tibble(
site = c("A", "A", "A", "B", "B", "B"),
sp = c("sp1", "sp1", "sp2", "sp1", "sp3", "sp4"),
day = c(1, 1, 2, 2, 2, 3),
temp = c(17, 20, 16, 18, 18, 20)
)
tbl
#> # A tibble: 6 x 4
#> site sp day temp
#> <chr> <chr> <dbl> <dbl>
#> 1 A sp1 1 17
#> 2 A sp1 1 20
#> 3 A sp2 2 16
#> 4 B sp1 2 18
#> 5 B sp3 2 18
#> 6 B sp4 3 20
And here’s the averaging step added:
tbl |>
group_by(site, day) |>
mutate(across(where(is.numeric), mean)) |>
pivot_wider(
names_from = sp,
values_from = sp,
values_fn = length,
values_fill = 0L
)
#> # A tibble: 4 x 7
#> # Groups: site, day [4]
#> site day temp sp1 sp2 sp3 sp4
#> <chr> <dbl> <dbl> <int> <int> <int> <int>
#> 1 A 1 18.5 2 0 0 0
#> 2 A 2 16 0 1 0 0
#> 3 B 2 18 1 0 1 0
#> 4 B 3 20 0 0 0 1