How to find the population for certain variables-CodePudding

I am having trouble getting a desired result in R and am seeking assistance. I have included my data below.

##       ID        DOB sector meters Oct   Res_FROM     Res_TO   Exp_FROM
## 1  20100 1979-08-24    H38   6400   W 1979-08-15 1991-05-15 1979-08-24
## 2  20101 1980-05-05    B01   1600  NW 1980-05-15 1991-04-15 1980-05-15
## 3  20102 1979-03-17    H04   1600  SW 1972-06-15 1979-08-15 1979-03-17
## 4  20103 1981-11-30    B09   3200  NE 1982-01-15 1984-01-15 1982-01-15
## 5  20103 1981-11-30    B37   8000   N 1984-01-15 1986-04-15 1984-01-15
## 6  20104 1978-09-01    B09   3200  NE 1982-01-15 1984-01-15 1982-01-15

Out of this data, I want to have R figure out how many IDs are in each sector. I shortened my data so that it would not become cluttered, but there are 100 sectors. I want to know how many IDs are in each sector, so for example, I need a result where sector B01 is listed with x number of IDs, sector B02 is listed with x number of IDs, and so on. My overall goal is to find the population of individuals in each sector, which can be identified by the IDs.

CodePudding user response：

In base R with aggregate:

aggregate(ID ~ sector, function(ID) length(unique(ID)), data = df)

  sector ID
1    B01  1
2    B09  2
3    B37  1
4    H04  1
5    H38  1

Using the dplyr package:

library(dplyr)

df %>% 
  group_by(sector) %>% 
  summarize(count = n_distinct(ID)) %>% 
  ungroup()

  sector count
  <chr>  <int>
1 B01        1
2 B09        2
3 B37        1
4 H04        1
5 H38        1

If you want to add this variable to your data frame, use mutate instead of summarize.