I would like to group all members of the same genera together for some summary statistics, but would like to maintain their full names in the original dataframe. I know that I could change their names or create a new column in the original dataframe but I am lookng for a more elegant solution. I would like to implement this in R and the dplyr package.
Example data here https://knb.ecoinformatics.org/knb/d1/mn/v2/object/urn:uuid:eb176981-1909-4d6d-ac07-3406e4efc43f
I would like to group all clams of the genus Macoma as one group, "Macoma sp." but ideally creating this grouping within the following, perhapse before the group_by(site_code, species_scientific)
summary <- data %>%
group_by(site_code, species_scientific) %>%
summarize(mean_size = mean(width_mm))
Note that there are multiple Macoma xxx species and multiple other species that I want to group as is.
CodePudding user response:
We may replace
the species_scientific by replace
ing the elements that have the substring 'Macoma' (str_detect
) with 'Macoma', use that as grouping column and get the mean
library(dplyr)
library(stringr)
data %>%
mutate(species_scientific = replace(species_scientific,
str_detect(species_scientific, "Macoma"), "Macoma")) %>%
group_by(site_code, species_scientific) %>%
summarise(mean_size = mean(width_mm, na.rm = TRUE), .groups = 'drop')
-output
# A tibble: 97 × 3
site_code species_scientific mean_size
<chr> <chr> <dbl>
1 H_01_a Clinocardium nuttallii 33.9
2 H_01_a Macoma 41.0
3 H_01_a Protothaca staminea 37.3
4 H_01_a Saxidomus gigantea 56.0
5 H_01_a Tresus nuttallii 100.
6 H_02_a Clinocardium nuttallii 35.1
7 H_02_a Macoma 41.3
8 H_02_a Protothaca staminea 38.0
9 H_02_a Saxidomus gigantea 54.7
10 H_02_a Tresus nuttallii 50.5
# … with 87 more rows
If the intention is to keep only the first word in 'species_scientific'
data %>%
group_by(genus = str_remove(species_scientific, "\\s .*"), site_code) %>%
summarise(mean_size = mean(width_mm, na.rm = TRUE), .groups = 'drop')
-output
# A tibble: 97 × 3
genus site_code mean_size
<chr> <chr> <dbl>
1 Clinocardium H_01_a 33.9
2 Clinocardium H_02_a 35.1
3 Clinocardium H_03_a 37.5
4 Clinocardium H_04_a 48.2
5 Clinocardium H_05_a 37.6
6 Clinocardium H_06_a 38.7
7 Clinocardium H_07_a 40.2
8 Clinocardium L_01_a 44.4
9 Clinocardium L_02_a 54.8
10 Clinocardium L_03_a 61.1
# … with 87 more rows