I have a dataframe with one identifier column of unique values, and one column which contains specific criteria.
I want to create a new identifier column of unique values, but where the value also contains information about which criteria it meets. In the example below, I have used case_when()
and seq_along()
to accomplish this:
set.seed(1)
df <- data.frame(
ID = LETTERS[1:10],
Criteria = paste0("Crit ", floor(runif(10, min=1, max=4)))
)
df %>%
mutate(
ID2 = case_when(
Criteria == "Crit 1" ~ paste0("x", seq_along(Criteria)),
Criteria == "Crit 2" ~ paste0("y", seq_along(Criteria)),
Criteria == "Crit 3" ~ paste0("z", seq_along(Criteria))
)
)
Output:
A data.frame: 10 × 3
ID Criteria ID2
A c1 x1
B c2 y2
C c2 y3
D c3 z4
E c1 x5
F c3 z6
G c3 z7
H c2 y8
I c2 y9
J c1 x10
The new column, ID2
, now has row values that are both unique (numbers 1 to 10) and where the criteria can be identified (letters x, y and z). However, seq_along()
inserts a new number for each row regardless of criterion. I'd rather that the count starts anew at one for each criterion. (Eg. for criterion c1
: x1
, x2
, x3
, ..., xn
; for c2
: y1
, y2
, y3
, ..., ym
; etc.)
What I want:
A data.frame: 10 × 3
ID Criteria ID2
A c1 x1
B c2 y1
C c2 y2
D c3 z1
E c1 x2
F c3 z2
G c3 z3
H c2 y3
I c2 y4
J c1 x3
CodePudding user response:
You can just add group_by(Criteria)
:
library(dplyr)
df %>%
group_by(Criteria) %>%
mutate(
ID2 = case_when(
Criteria == "Crit 1" ~ paste0("x", seq_along(Criteria)),
Criteria == "Crit 2" ~ paste0("y", seq_along(Criteria)),
Criteria == "Crit 3" ~ paste0("z", seq_along(Criteria))
)
)
Output:
# A tibble: 10 × 3
# Groups: Criteria [3]
ID Criteria ID2
<chr> <chr> <chr>
1 A Crit 1 x1
2 B Crit 2 y1
3 C Crit 2 y2
4 D Crit 3 z1
5 E Crit 1 x2
6 F Crit 3 z2
7 G Crit 3 z3
8 H Crit 2 y3
9 I Crit 2 y4
10 J Crit 1 x3