Home > Net >  Stata command to R: sum with conditions
Stata command to R: sum with conditions

Time:03-31

Anyone knows how to translate this stata command into a R command?

by city, sort : egen float total_population = total (id)

Example

id  city
1   a
1   a
1   a
2   r
2   r
3   r
6   h
7   h
8   h
9   h
10  h

Expected result

id  city    total _population
1   a   1
1   a   1
1   a   1
2   r   2
2   r   2
3   r   2
6   h   5
7   h   5
8   h   5
9   h   5
10  h   5

CodePudding user response:

We need n_distinct (number of distinct elements in 'id') after grouping by 'id'

library(dplyr)
df1 <- df1 %>% 
   group_by(city) %>% 
   mutate(total_population = n_distinct(id)) %>%
   ungroup

-output

df1
# A tibble: 11 × 3
      id city  total_population
   <int> <chr>            <int>
 1     1 a                    1
 2     1 a                    1
 3     1 a                    1
 4     2 r                    2
 5     2 r                    2
 6     3 r                    2
 7     6 h                    5
 8     7 h                    5
 9     8 h                    5
10     9 h                    5
11    10 h                    5

In base R, this can be done with ave

df1$total_population <- with(df1, ave(id, city,
     FUN = function(x) length(unique(x))))

data

df1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 3L, 6L, 7L, 8L, 9L, 
10L), city = c("a", "a", "a", "r", "r", "r", "h", "h", "h", "h", 
"h")), class = "data.frame", row.names = c(NA, -11L))
  • Related