How to create dummy that equals 1 (and 0 otherwise) if an id appears only once? (in R)-CodePudding

structure(list(id = c(1L, 1L, 2L, 3L, 3L, 4L), hire_year = c(2017L, 
2017L, 2017L, 2017L, 2016L, 2016L)), class = "data.frame", row.names = c(NA, 
-6L))
  id hire_year
1  1      2017
2  1      2017
3  2      2017
4  3      2017
5  3      2016
6  4      2016

**Expected output**
  id hire_year dummy
1  1      2017     0
2  1      2017     0
3  2      2017     1
4  3      2017     0
5  3      2016     0
6  4      2016     1

How to create dummy that equals 1 (and 0 otherwise) if an id appears only once?

CodePudding user response：

With tidyverse, we can group by the id, then use the number of observations within an ifelse statement.

library(tidyverse)

df %>%
  group_by(id) %>%
  mutate(dummy = ifelse(n() == 1, 1, 0))

Or we could add the number of observations, then change the value based on the condition.

df %>% 
  add_count(id, name = "dummy") %>% 
  mutate(n = ifelse(n == 1, 1, 0))

Output

  id hire_year dummy
1  1      2017     0
2  1      2017     0
3  2      2017     1
4  3      2017     0
5  3      2016     0
6  4      2016     1

CodePudding user response：

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
structure(list(id = c(1L, 1L, 2L, 3L, 3L, 4L), hire_year = c(2017L, 
                                                             2017L, 2017L, 2017L, 2016L, 2016L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                      -6L)
    ) %>% 
    add_count(id, name = 'dummy') %>% 
    mutate(
        dummy = as.integer(dummy == 1)
    )
#>   id hire_year dummy
#> 1  1      2017     0
#> 2  1      2017     0
#> 3  2      2017     1
#> 4  3      2017     0
#> 5  3      2016     0
#> 6  4      2016     1

^{Created on 2022-03-04 by the reprex package (v2.0.0)}

CodePudding user response：

We can use ave in base R like below

> transform(df, dummy =  (ave(id, id, FUN = length) == 1))
  id hire_year dummy
1  1      2017     0
2  1      2017     0
3  2      2017     1
4  3      2017     0
5  3      2016     0
6  4      2016     1

CodePudding user response：

A data.table solution:

library(data.table)

DT <- structure(list(id = c(1L, 1L, 2L, 3L, 3L, 4L), hire_year = c(2017L, 
2017L, 2017L, 2017L, 2016L, 2016L)), class = "data.frame", row.names = c(NA, 
-6L))
# Convert into data.table
setDT(DT)

# Count number of times "id" shows up
DT[, count := .N, by =.(id)]

# Create a dummy variable that equals 1 if count ==1
DT[, dummy := fifelse(count == 1,1,0)]


     id   hire_year count dummy
   <int>     <int> <int> <num>
1:     1      2017     2     0
2:     1      2017     2     0
3:     2      2017     1     1
4:     3      2017     2     0
5:     3      2016     2     0
6:     4      2016     1     1