Count a string & sum them in a new column in R using dplyr?-CodePudding

I have a dataset with different types of observations across several "transects". Still pretty new to R, and struggling with the below issue...

I need to calculate the number of "nest" observations in each transect, but I am getting an error that makes me think maybe I am not using the correct function? In the end, I want to create a new column called "nest_number" which has the sum of the number of observations equal to nest.

The data is in this format:

transect	observation
1A	nest
1A	NA
1A	nest
1A	vocalization
1A	NA
2A	nest
2A	NA
...	...

Here is how I need the output to look:

transect	observation	nest_number
1A	nest	2
1A	NA	2
1A	nest	2
1A	vocalization	2
1A	NA	2
2A	nest	1
2A	NA	1
...	...	...

Here is the code I used

dfNew <- df %>%
  group_by(transect) %>%
  mutate(number_nests = colSums(observation == "nest", na.rm = TRUE))

The error I get is:

'x' must be an array of at least two dimensions The error occurred in group 1: transect = "1A".

CodePudding user response：

It should be sum and not colSums because colSums expect a data.frame/matrix, but here we are doing the sum on a logical vector (observation == "nest")

library(dplyr)
df %>% 
  group_by(transect) %>% 
  mutate(nest_number = sum(observation == "nest", na.rm = TRUE)) %>%
  ungroup

-output

# A tibble: 7 × 3
  transect observation  nest_number
  <chr>    <chr>              <int>
1 1A       nest                   2
2 1A       <NA>                   2
3 1A       nest                   2
4 1A       vocalization           2
5 1A       <NA>                   2
6 2A       nest                   1
7 2A       <NA>                   1