Getting the median for specific categories within a data.frame-CodePudding

Here is a sample of the data I'm working with

county	urban_continuum	p.shannon	p.simpson
Brunswick	B_Town	3.804079	0.9744810
Accomack	A_Rural	3.830896	0.9771901
Buena Vista	B_Town	3.970617	0.9802289
Amherst	D_City	4.007048	0.9813272
Buckingham	C_Suburb	4.055685	0.9796187
Campbell	D_City	4.161142	0.9837963
Cumberland	A_Rural	4.229130	0.9850256
Danville	C_Suburb	4.631135	0.9888504

Note: "p.simpson" and "p.shannon" refer to simpson diversity and shannon diversity I'm trying to get the mean and the standard deviation for each category (e.g. the mean for "B_Town" is 3.97235). I first used aggregate. Here's what I have for the mean (the code for standard deviation is the same but FUN="sd"): urbancon_div.mean=aggregate(p.simpson~urban_continuum p.shannon, data=plant.co, FUN="mean") Here's what R gives me:

Notice that even when "county" is not in the code, it still gives me means for individual counties. I'm trying to find the mean of each diversity metric for each category across all counties. How do I get the mean and sd for each category across all counties not by individual counties?

CodePudding user response：

You may try using dplyr

library(dplyr)

plant.co <- read.table(text = "county   urban_continuum p.shannon   p.simpson
Brunswick   B_Town  3.804079    0.9744810
Accomack    A_Rural 3.830896    0.9771901
'Buena Vista'   B_Town  3.970617    0.9802289
Amherst D_City  4.007048    0.9813272
Buckingham  C_Suburb    4.055685    0.9796187
Campbell    D_City  4.161142    0.9837963
Cumberland  A_Rural 4.229130    0.9850256
Danville    C_Suburb    4.631135    0.9888504", header = T)

plant.co %>%
  group_by(urban_continuum) %>%
  summarize(p.shannon.mean = mean(p.shannon),
            p.shannon.sd = sd(p.shannon),
            p.simpson.mean = mean(p.simpson),
            p.simpson.sd = sd(p.simpson))

  urban_continuum p.shannon.mean p.shannon.sd p.simpson.mean p.simpson.sd
  <chr>                    <dbl>        <dbl>          <dbl>        <dbl>
1 A_Rural                   4.03        0.282          0.981      0.00554
2 B_Town                    3.89        0.118          0.977      0.00406
3 C_Suburb                  4.34        0.407          0.984      0.00653
4 D_City                    4.08        0.109          0.983      0.00175

CodePudding user response：

If you are using aggregate:

aggregate(cbind(p.simpson, p.shannon)~urban_continuum, df, \(x)c(mean = mean(x), sd = sd(x)))

  urban_continuum p.simpson.mean p.simpson.sd p.shannon.mean p.shannon.sd
1         A_Rural    0.981107850  0.005540535      4.0300130    0.2815940
2          B_Town    0.977354950  0.004064379      3.8873480    0.1177601
3        C_Suburb    0.984234550  0.006527798      4.3434100    0.4069046
4          D_City    0.982561750  0.001745917      4.0840950    0.1089609

or simply:

aggregate(.~urban_continuum, df[-1], \(x)c(mean = mean(x), sd = sd(x)))