I have a dataset with three identifiers. Say one identifies the country, another identifies time, and another which identifies a person. Like the data frame below:
country person time
1 A John 1
2 A John 2
3 A John 3
4 A Peter 1
5 A Peter 2
6 A Peter 3
7 B David 1
8 B Thomas 2
9 B David 3
10 B Adam 1
11 B Adam 2
12 B Thomas 3
How can I create a variable that generates a sequence of letters that will identify each person by Country? The output should look like this:
country person time Letterseq
1 A John 1 A
2 A John 2 A
3 A John 3 A
4 A Peter 1 B
5 A Peter 2 B
6 A Peter 3 B
7 B David 1 A
8 B Thomas 2 B
9 B David 3 A
10 B Adam 1 C
11 B Adam 2 C
12 B Thomas 3 B
Let me know if you need more clarification.
CodePudding user response:
If the unique person length are less than 26 in each 'country', group by 'country', get a numeric index by match
ing 'person' with unique
values of 'person', use the index to return the corresponding values from inbuilt vector 'LETTERS'
library(dplyr)
df1 <- df1 %>%
group_by(country) %>%
mutate(Letterseq = LETTERS[match(person, unique(person))]) %>%
ungroup
-output
df1
# A tibble: 12 × 4
country person time Letterseq
<chr> <chr> <int> <chr>
1 A John 1 A
2 A John 2 A
3 A John 3 A
4 A Peter 1 B
5 A Peter 2 B
6 A Peter 3 B
7 B David 1 A
8 B Thomas 2 B
9 B David 3 A
10 B Adam 1 C
11 B Adam 2 C
12 B Thomas 3 B
data
df1 <- structure(list(country = c("A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), person = c("John", "John", "John",
"Peter", "Peter", "Peter", "David", "Thomas", "David", "Adam",
"Adam", "Thomas"), time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12"))
CodePudding user response:
With data.table
:
dt[order(country,person),LetterSeq:=LETTERS[rleid(person)],by=country][]
country person time LetterSeq
<char> <char> <int> <char>
1: A John 1 A
2: A John 2 A
3: A John 3 A
4: A Peter 1 B
5: A Peter 2 B
6: A Peter 3 B
7: B David 1 B
8: B Thomas 2 C
9: B David 3 B
10: B Adam 1 A
11: B Adam 2 A
12: B Thomas 3 C