I have 1609 observations from 93 unique publications. I am doing a qualitative analysis of my data, and I have the following variables: soil texture (coarse soil, sandy soil, sandy loam, sandy clay loam), experimental design (field, greenhouse, and lab), and publication title (93 unique publication titles). I want to count unique publication titles for each soil texture for each experimental design.
I could only get unique publication titles for each experimental designs or each soil texture using the following code:
df4_2 <- metadata2 %>%
group_by(publication_title) %>%
group_by(experiment_cond) %>%
summarise(count = n_distinct(publication_title))%>%
drop_na()
View(df4_2)
# OR
df4_3 <- metadata2 %>%
group_by(publication_title) %>%
group_by(soil_texture) %>%
summarise(count = n_distinct(publication_title))%>%
drop_na()
View(df4_3)
Does anyone know how can I summarize unique publication titles for each the soil texture and each experimental design?
I tried the following code but it did not work:
df4_4 <- metadata2 %>%
group_by(publication_title) %>%
group_by(soil_texture) %>%
group_by(experiment_cond) %>%
summarise(count = n_distinct(publication_title))%>%
drop_na()
View(df4_4)
CodePudding user response:
By default, each group_by()
overrides/drops any previous groupings. To group by multiple variables, include them in the same group_by()
call:
library(dplyr)
metadata2 %>%
group_by(soil_texture, experiment_cond) %>%
summarise(count = n_distinct(publication_title)) %>%
drop_na()
If you did need to add groups in separate calls (not necessary here, but sometimes useful), use the .add
argument:
metadata2 %>%
group_by(soil_texture) %>%
group_by(experiment_cond, .add = TRUE) %>%
summarise(count = n_distinct(publication_title)) %>%
drop_na()
Finally, note you shouldn’t group by publication_title
; if you do, the n_distinct()
per group would always be 1.