Home > Back-end >  I need help in adding columns in an abundance df by plot
I need help in adding columns in an abundance df by plot

Time:11-21

I have an abundance df with 63 species in the columns and a column with the plots from 1 to 6. The plot repeats 9 times because it represents the 9 subplots I have. With the first 18 (2 plots) rows and first 3 columns it looks like this:

> taxa_ab
    plot Sp1 Sp2
1   1    0   0
2   1    1   1             
3   1    0   0               
4   1    0   0               
5   1    0   0               
6   1    0   3               
7   1    0   0               
8   1    0   0               
9   1    0   4               
10  2    4   0               
11  2    0   0               
12  2    0   2               
13  2    0   0               
14  2    0   0               
15  2    0   0
16  2    0   2               
17  2    0   0               
18  2    0   0               

I want to sum the species by plot so the plot becomes the row name and it looks like this:

> ab_new
    Sp1 Sp2
1   1   8
2   4   4 

I tried to use the aggregate function but I haven't understood how to use it.

ab_new <- taxa.ab[,-2] %>%
        aggregate(., by = plot, FUN = "sum")

Also my species abundance are integers and I can't seem to convert them to numeric without loosing the structure of the data frame by unlisting the columns.

> str(taxa_ab)
'data.frame':   54 obs. of  64 variables:
 $ plot     : chr  "1" "1" "1" "1" ...
 $ Sp1      : int  0 1 0 0 0 0 0 0 0 0 ...
 $ Sp2      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Sp3      : int  0 0 0 1 0 0 1 2 1 1 ...

CodePudding user response:

Specifically, with the function you were already working on:

To use aggregate the way you want, by should be a list of the same size of the dataset, not just the name of the variable. That is why I'm putting taxa.ab[1]. This way I only get the first column (i.e., plot).

And, you want to input the whole data without the grouping var, that is why I put taxa.ab[-1]

Solution

ab_new <- taxa.ab[-1] %>%
  aggregate(., by = taxa.ab[1], FUN = "sum")

Output

#   plot Sp1 Sp2
# 1    1   1   8
# 2    2   4   4

CodePudding user response:

I think group_by and summarize are the functions that you are looking for.

group_by(plot) will group all of the rows in your data frame by the plot variable, and then you can use the summarize function to specify how to treat those grouped rows.

Based on your description, I think your code would look something like

df = df %>%
   group_by(plot) %>%
   summarize(Sp1 = sum(Sp1),
             Sp2 = sum(Sp2))

CodePudding user response:

Base R solution

With aggregate the pipe operator is doing more harm than good. When using the formula interface the dot . has a special meaning, it means "all variables not yet in the formula". And since the dot also has a special meaning in magrittr's pipes, there will be a conflict. So, use the (in this case) simpler formula interface.

aggregate(. ~ plot, taxa_ab, sum)
#>   plot Sp1 Sp2
#> 1    1   1   8
#> 2    2   4   4

Created on 2022-11-21 with reprex v2.0.2

And the result is all numeric, there's no structure loss problem.

agg <- aggregate(. ~ plot, taxa_ab, sum)
str(agg)
#> 'data.frame':    2 obs. of  3 variables:
#>  $ plot: int  1 2
#>  $ Sp1 : int  1 4
#>  $ Sp2 : int  8 4

Created on 2022-11-21 with reprex v2.0.2


dplyr solution

Just group and summarise.

suppressPackageStartupMessages(
  library(dplyr)
)

taxa_ab %>%
  group_by(plot) %>%
  summarise(across(starts_with('Sp'), sum))
#> # A tibble: 2 × 3
#>    plot   Sp1   Sp2
#>   <int> <int> <int>
#> 1     1     1     8
#> 2     2     4     4

Created on 2022-11-21 with reprex v2.0.2


Data

taxa_ab<-'
plot Sp1 Sp2
1   1    0   0
2   1    1   1             
3   1    0   0               
4   1    0   0               
5   1    0   0               
6   1    0   3               
7   1    0   0               
8   1    0   0               
9   1    0   4               
10  2    4   0               
11  2    0   0               
12  2    0   2               
13  2    0   0               
14  2    0   0               
15  2    0   0
16  2    0   2               
17  2    0   0               
18  2    0   0          
'
taxa_ab <- read.table(textConnection(taxa_ab), header = TRUE)

Created on 2022-11-21 with reprex v2.0.2

  • Related