Home > OS >  How to find the population for certain variables
How to find the population for certain variables

Time:02-15

I am having trouble getting a desired result in R and am seeking assistance. I have included my data below.

##       ID        DOB sector meters Oct   Res_FROM     Res_TO   Exp_FROM
## 1  20100 1979-08-24    H38   6400   W 1979-08-15 1991-05-15 1979-08-24
## 2  20101 1980-05-05    B01   1600  NW 1980-05-15 1991-04-15 1980-05-15
## 3  20102 1979-03-17    H04   1600  SW 1972-06-15 1979-08-15 1979-03-17
## 4  20103 1981-11-30    B09   3200  NE 1982-01-15 1984-01-15 1982-01-15
## 5  20103 1981-11-30    B37   8000   N 1984-01-15 1986-04-15 1984-01-15
## 6  20104 1978-09-01    B09   3200  NE 1982-01-15 1984-01-15 1982-01-15

Out of this data, I want to have R figure out how many IDs are in each sector. I shortened my data so that it would not become cluttered, but there are 100 sectors. I want to know how many IDs are in each sector, so for example, I need a result where sector B01 is listed with x number of IDs, sector B02 is listed with x number of IDs, and so on. My overall goal is to find the population of individuals in each sector, which can be identified by the IDs.

CodePudding user response:

In base R with aggregate:

aggregate(ID ~ sector, function(ID) length(unique(ID)), data = df)

  sector ID
1    B01  1
2    B09  2
3    B37  1
4    H04  1
5    H38  1

Using the dplyr package:

library(dplyr)

df %>% 
  group_by(sector) %>% 
  summarize(count = n_distinct(ID)) %>% 
  ungroup()

  sector count
  <chr>  <int>
1 B01        1
2 B09        2
3 B37        1
4 H04        1
5 H38        1

If you want to add this variable to your data frame, use mutate instead of summarize.

  • Related