I have a dataframe df1
in which I have different sites (df1$Site
) belonging to different regions (df1$Regions
) in which I have data about evidence of herbivory and its type (df1$Herbivory_type
). When there is no herbivory then, df1$Herbivory_type
is NA
. Below I show an example of my dataframe:
df1 <- data.frame(Region=c("ALI1","ALI1","ALI1","ALI1","ALI2","ALI2","ALI2","ALI3","ALI3","ALI3","ALI3","ALI5","ALI5"),
Site=c("ALI1_A","ALI1_B","ALI1_C","ALI1_D","ALI2_A","ALI2_B","ALI2_C","ALI3_A","ALI3_B","ALI3_C","ALI3_D","ALI5_A","ALI5_B"),
Herbivory_type=c(NA,"S",NA,NA,NA,NA,NA,NA,"S","S",NA,NA,"S"))
df1$Herbivory_type <- as.factor(df1$Herbivory_type)
df1
Region Site Herbivory_type
1 ALI1 ALI1_A <NA>
2 ALI1 ALI1_B S
3 ALI1 ALI1_C <NA>
4 ALI1 ALI1_D <NA>
5 ALI2 ALI2_A <NA>
6 ALI2 ALI2_B <NA>
7 ALI2 ALI2_C <NA>
8 ALI3 ALI3_A <NA>
9 ALI3 ALI3_B S
10 ALI3 ALI3_C S
11 ALI3 ALI3_D <NA>
12 ALI5 ALI5_A <NA>
13 ALI5 ALI5_B S
I need to know the number of episodes of herbivorism by Region considering NA
in the counting in df1$Site
. I would expect this result:
df2
Region N_Hervivory_S
1 ALI1 1
2 ALI2 0 # All sites have `NA`, thus, herbivorims is 0 in this region.
3 ALI3 2
4 ALI5 1
I tried this:
as.data.frame(df1 %>% group_by(Region,Herbivory_type) %>% summarise(N = n()))
But the output is not what I desire
Region Herbivory_type N
1 ALI1 S 1
2 ALI1 <NA> 3
3 ALI2 <NA> 3
4 ALI3 S 2
5 ALI3 <NA> 2
6 ALI5 S 1
7 ALI5 <NA> 1
Does anyone know how to do it?
Thanks in advance
CodePudding user response:
You can use count()
to sum up !is.na(Herbivory_type)
by group, and get the number of non-missing values for each region.
library(dplyr)
df1 %>%
count(Region, wt = !is.na(Herbivory_type))
# # A tibble: 4 × 2
# Region res
# <chr> <int>
# 1 ALI1 1
# 2 ALI2 0
# 3 ALI3 2
# 4 ALI5 1
CodePudding user response:
library(dplyr)
df1 %>%
group_by(Region) %>%
summarise(n_Herbivory_S = sum(Herbivory_type %in% c("S")))
(assuming there may be other categories to ignore in the real dataset - otherwise !is.na()
is simpler)
CodePudding user response:
You can count non-NAs, i.e.
library(dplyr)
df1 %>%
group_by(Region) %>%
summarise(res = sum(!is.na(Herbivory_type)))
# A tibble: 4 × 2
Region res
<chr> <int>
1 ALI1 1
2 ALI2 0
3 ALI3 2
4 ALI5 1