Home > Mobile >  Getting the median for specific categories within a data.frame
Getting the median for specific categories within a data.frame

Time:12-08

Here is a sample of the data I'm working with

county urban_continuum p.shannon p.simpson
Brunswick B_Town 3.804079 0.9744810
Accomack A_Rural 3.830896 0.9771901
Buena Vista B_Town 3.970617 0.9802289
Amherst D_City 4.007048 0.9813272
Buckingham C_Suburb 4.055685 0.9796187
Campbell D_City 4.161142 0.9837963
Cumberland A_Rural 4.229130 0.9850256
Danville C_Suburb 4.631135 0.9888504

Note: "p.simpson" and "p.shannon" refer to simpson diversity and shannon diversity I'm trying to get the mean and the standard deviation for each category (e.g. the mean for "B_Town" is 3.97235). I first used aggregate. Here's what I have for the mean (the code for standard deviation is the same but FUN="sd"): urbancon_div.mean=aggregate(p.simpson~urban_continuum p.shannon, data=plant.co, FUN="mean") Here's what R gives me:

aggregated df

Notice that even when "county" is not in the code, it still gives me means for individual counties. I'm trying to find the mean of each diversity metric for each category across all counties. How do I get the mean and sd for each category across all counties not by individual counties?

CodePudding user response:

You may try using dplyr

library(dplyr)

plant.co <- read.table(text = "county   urban_continuum p.shannon   p.simpson
Brunswick   B_Town  3.804079    0.9744810
Accomack    A_Rural 3.830896    0.9771901
'Buena Vista'   B_Town  3.970617    0.9802289
Amherst D_City  4.007048    0.9813272
Buckingham  C_Suburb    4.055685    0.9796187
Campbell    D_City  4.161142    0.9837963
Cumberland  A_Rural 4.229130    0.9850256
Danville    C_Suburb    4.631135    0.9888504", header = T)

plant.co %>%
  group_by(urban_continuum) %>%
  summarize(p.shannon.mean = mean(p.shannon),
            p.shannon.sd = sd(p.shannon),
            p.simpson.mean = mean(p.simpson),
            p.simpson.sd = sd(p.simpson))

  urban_continuum p.shannon.mean p.shannon.sd p.simpson.mean p.simpson.sd
  <chr>                    <dbl>        <dbl>          <dbl>        <dbl>
1 A_Rural                   4.03        0.282          0.981      0.00554
2 B_Town                    3.89        0.118          0.977      0.00406
3 C_Suburb                  4.34        0.407          0.984      0.00653
4 D_City                    4.08        0.109          0.983      0.00175

CodePudding user response:

If you are using aggregate:

aggregate(cbind(p.simpson, p.shannon)~urban_continuum, df, \(x)c(mean = mean(x), sd = sd(x)))

  urban_continuum p.simpson.mean p.simpson.sd p.shannon.mean p.shannon.sd
1         A_Rural    0.981107850  0.005540535      4.0300130    0.2815940
2          B_Town    0.977354950  0.004064379      3.8873480    0.1177601
3        C_Suburb    0.984234550  0.006527798      4.3434100    0.4069046
4          D_City    0.982561750  0.001745917      4.0840950    0.1089609

or simply:

aggregate(.~urban_continuum, df[-1], \(x)c(mean = mean(x), sd = sd(x)))
  • Related