I have an abundance df with 63 species in the columns and a column with the plots from 1 to 6. The plot repeats 9 times because it represents the 9 subplots I have. With the first 18 (2 plots) rows and first 3 columns it looks like this:
> taxa_ab
plot Sp1 Sp2
1 1 0 0
2 1 1 1
3 1 0 0
4 1 0 0
5 1 0 0
6 1 0 3
7 1 0 0
8 1 0 0
9 1 0 4
10 2 4 0
11 2 0 0
12 2 0 2
13 2 0 0
14 2 0 0
15 2 0 0
16 2 0 2
17 2 0 0
18 2 0 0
I want to sum the species by plot so the plot becomes the row name and it looks like this:
> ab_new
Sp1 Sp2
1 1 8
2 4 4
I tried to use the aggregate function but I haven't understood how to use it.
ab_new <- taxa.ab[,-2] %>%
aggregate(., by = plot, FUN = "sum")
Also my species abundance are integers and I can't seem to convert them to numeric without loosing the structure of the data frame by unlisting the columns.
> str(taxa_ab)
'data.frame': 54 obs. of 64 variables:
$ plot : chr "1" "1" "1" "1" ...
$ Sp1 : int 0 1 0 0 0 0 0 0 0 0 ...
$ Sp2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ Sp3 : int 0 0 0 1 0 0 1 2 1 1 ...
CodePudding user response:
Specifically, with the function you were already working on:
To use aggregate
the way you want, by
should be a list of the same size of the dataset, not just the name of the variable. That is why I'm putting taxa.ab[1]
. This way I only get the first column (i.e., plot).
And, you want to input the whole data without the grouping var, that is why I put taxa.ab[-1]
Solution
ab_new <- taxa.ab[-1] %>%
aggregate(., by = taxa.ab[1], FUN = "sum")
Output
# plot Sp1 Sp2
# 1 1 1 8
# 2 2 4 4
CodePudding user response:
I think group_by
and summarize
are the functions that you are looking for.
group_by(plot)
will group all of the rows in your data frame by the plot
variable, and then you can use the summarize
function to specify how to treat those grouped rows.
Based on your description, I think your code would look something like
df = df %>%
group_by(plot) %>%
summarize(Sp1 = sum(Sp1),
Sp2 = sum(Sp2))
CodePudding user response:
Base R solution
With aggregate
the pipe operator is doing more harm than good. When using the formula interface the dot .
has a special meaning, it means "all variables not yet in the formula". And since the dot also has a special meaning in magrittr
's pipes, there will be a conflict. So, use the (in this case) simpler formula interface.
aggregate(. ~ plot, taxa_ab, sum)
#> plot Sp1 Sp2
#> 1 1 1 8
#> 2 2 4 4
Created on 2022-11-21 with reprex v2.0.2
And the result is all numeric, there's no structure loss problem.
agg <- aggregate(. ~ plot, taxa_ab, sum)
str(agg)
#> 'data.frame': 2 obs. of 3 variables:
#> $ plot: int 1 2
#> $ Sp1 : int 1 4
#> $ Sp2 : int 8 4
Created on 2022-11-21 with reprex v2.0.2
dplyr
solution
Just group and summarise.
suppressPackageStartupMessages(
library(dplyr)
)
taxa_ab %>%
group_by(plot) %>%
summarise(across(starts_with('Sp'), sum))
#> # A tibble: 2 × 3
#> plot Sp1 Sp2
#> <int> <int> <int>
#> 1 1 1 8
#> 2 2 4 4
Created on 2022-11-21 with reprex v2.0.2
Data
taxa_ab<-'
plot Sp1 Sp2
1 1 0 0
2 1 1 1
3 1 0 0
4 1 0 0
5 1 0 0
6 1 0 3
7 1 0 0
8 1 0 0
9 1 0 4
10 2 4 0
11 2 0 0
12 2 0 2
13 2 0 0
14 2 0 0
15 2 0 0
16 2 0 2
17 2 0 0
18 2 0 0
'
taxa_ab <- read.table(textConnection(taxa_ab), header = TRUE)
Created on 2022-11-21 with reprex v2.0.2