How to create letter sequence within group?-CodePudding

I have a dataset with three identifiers. Say one identifies the country, another identifies time, and another which identifies a person. Like the data frame below:

   country person time
1        A   John    1
2        A   John    2
3        A   John    3
4        A  Peter    1
5        A  Peter    2
6        A  Peter    3
7        B  David    1
8        B  Thomas   2
9        B  David    3
10       B  Adam     1
11       B  Adam     2
12       B  Thomas   3

How can I create a variable that generates a sequence of letters that will identify each person by Country? The output should look like this:

        country person time   Letterseq
    1        A   John    1    A
    2        A   John    2    A
    3        A   John    3    A
    4        A  Peter    1    B
    5        A  Peter    2    B
    6        A  Peter    3    B
    7        B  David    1    A
    8        B  Thomas   2    B
    9        B  David    3    A
    10       B  Adam     1    C
    11       B  Adam     2    C
    12       B  Thomas   3    B

Let me know if you need more clarification.

CodePudding user response：

If the unique person length are less than 26 in each 'country', group by 'country', get a numeric index by matching 'person' with unique values of 'person', use the index to return the corresponding values from inbuilt vector 'LETTERS'

library(dplyr)
df1 <- df1 %>%
   group_by(country) %>% 
   mutate(Letterseq = LETTERS[match(person, unique(person))]) %>% 
   ungroup

-output

df1
# A tibble: 12 × 4
   country person  time Letterseq
   <chr>   <chr>  <int> <chr>    
 1 A       John       1 A        
 2 A       John       2 A        
 3 A       John       3 A        
 4 A       Peter      1 B        
 5 A       Peter      2 B        
 6 A       Peter      3 B        
 7 B       David      1 A        
 8 B       Thomas     2 B        
 9 B       David      3 A        
10 B       Adam       1 C        
11 B       Adam       2 C        
12 B       Thomas     3 B

data

df1 <- structure(list(country = c("A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B"), person = c("John", "John", "John", 
"Peter", "Peter", "Peter", "David", "Thomas", "David", "Adam", 
"Adam", "Thomas"), time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12"))

CodePudding user response：

With data.table:

dt[order(country,person),LetterSeq:=LETTERS[rleid(person)],by=country][]

    country person  time LetterSeq
     <char> <char> <int>    <char>
 1:       A   John     1         A
 2:       A   John     2         A
 3:       A   John     3         A
 4:       A  Peter     1         B
 5:       A  Peter     2         B
 6:       A  Peter     3         B
 7:       B  David     1         B
 8:       B Thomas     2         C
 9:       B  David     3         B
10:       B   Adam     1         A
11:       B   Adam     2         A
12:       B Thomas     3         C