Summarizing categorical variables by multiple groups-CodePudding

I have 1609 observations from 93 unique publications. I am doing a qualitative analysis of my data, and I have the following variables: soil texture (coarse soil, sandy soil, sandy loam, sandy clay loam), experimental design (field, greenhouse, and lab), and publication title (93 unique publication titles). I want to count unique publication titles for each soil texture for each experimental design.

I could only get unique publication titles for each experimental designs or each soil texture using the following code:

df4_2 <- metadata2 %>%
  group_by(publication_title) %>%
  group_by(experiment_cond) %>%
  summarise(count = n_distinct(publication_title))%>%
  drop_na()
View(df4_2)

# OR

df4_3 <- metadata2 %>%
  group_by(publication_title) %>%
  group_by(soil_texture) %>%
  summarise(count = n_distinct(publication_title))%>%
  drop_na()
View(df4_3)

Does anyone know how can I summarize unique publication titles for each the soil texture and each experimental design?

I tried the following code but it did not work:

df4_4 <- metadata2 %>%
  group_by(publication_title) %>%
  group_by(soil_texture) %>%
  group_by(experiment_cond) %>%
  summarise(count = n_distinct(publication_title))%>%
  drop_na()
View(df4_4)

CodePudding user response：

By default, each group_by() overrides/drops any previous groupings. To group by multiple variables, include them in the same group_by() call:

library(dplyr)

metadata2 %>%
  group_by(soil_texture, experiment_cond) %>%
  summarise(count = n_distinct(publication_title)) %>%
  drop_na()

If you did need to add groups in separate calls (not necessary here, but sometimes useful), use the .add argument:

metadata2 %>%
  group_by(soil_texture) %>%
  group_by(experiment_cond, .add = TRUE) %>%
  summarise(count = n_distinct(publication_title)) %>%
  drop_na()

Finally, note you shouldn’t group by publication_title; if you do, the n_distinct() per group would always be 1.