I am having trouble getting a desired result in R and am seeking assistance. I have included my data below.
## ID DOB sector meters Oct Res_FROM Res_TO Exp_FROM
## 1 20100 1979-08-24 H38 6400 W 1979-08-15 1991-05-15 1979-08-24
## 2 20101 1980-05-05 B01 1600 NW 1980-05-15 1991-04-15 1980-05-15
## 3 20102 1979-03-17 H04 1600 SW 1972-06-15 1979-08-15 1979-03-17
## 4 20103 1981-11-30 B09 3200 NE 1982-01-15 1984-01-15 1982-01-15
## 5 20103 1981-11-30 B37 8000 N 1984-01-15 1986-04-15 1984-01-15
## 6 20104 1978-09-01 B09 3200 NE 1982-01-15 1984-01-15 1982-01-15
Out of this data, I want to have R figure out how many IDs are in each sector. I shortened my data so that it would not become cluttered, but there are 100 sectors. I want to know how many IDs are in each sector, so for example, I need a result where sector B01 is listed with x number of IDs, sector B02 is listed with x number of IDs, and so on. My overall goal is to find the population of individuals in each sector, which can be identified by the IDs.
CodePudding user response:
In base R with aggregate
:
aggregate(ID ~ sector, function(ID) length(unique(ID)), data = df)
sector ID
1 B01 1
2 B09 2
3 B37 1
4 H04 1
5 H38 1
Using the dplyr
package:
library(dplyr)
df %>%
group_by(sector) %>%
summarize(count = n_distinct(ID)) %>%
ungroup()
sector count
<chr> <int>
1 B01 1
2 B09 2
3 B37 1
4 H04 1
5 H38 1
If you want to add this variable to your data frame, use mutate
instead of summarize
.