Anyone knows how to translate this stata command into a R command?
by city, sort : egen float total_population = total (id)
Example
id city
1 a
1 a
1 a
2 r
2 r
3 r
6 h
7 h
8 h
9 h
10 h
Expected result
id city total _population
1 a 1
1 a 1
1 a 1
2 r 2
2 r 2
3 r 2
6 h 5
7 h 5
8 h 5
9 h 5
10 h 5
CodePudding user response:
We need n_distinct
(number of distinct elements in 'id') after grouping by 'id'
library(dplyr)
df1 <- df1 %>%
group_by(city) %>%
mutate(total_population = n_distinct(id)) %>%
ungroup
-output
df1
# A tibble: 11 × 3
id city total_population
<int> <chr> <int>
1 1 a 1
2 1 a 1
3 1 a 1
4 2 r 2
5 2 r 2
6 3 r 2
7 6 h 5
8 7 h 5
9 8 h 5
10 9 h 5
11 10 h 5
In base R
, this can be done with ave
df1$total_population <- with(df1, ave(id, city,
FUN = function(x) length(unique(x))))
data
df1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 3L, 6L, 7L, 8L, 9L,
10L), city = c("a", "a", "a", "r", "r", "r", "h", "h", "h", "h",
"h")), class = "data.frame", row.names = c(NA, -11L))