Hello and welcome to my question.
I want to create a list of unique, numerical, position-based identifiers to denote the position of items in a column of values in a dataframe.
Here's an example of what I want (in column Position
):
tibble(Fruit = c(rep("Apple", 3), rep("Pear", 2), rep("Orange", 4)),
Variety = c("Gala", "Envy", "Pink Lady", "Anjou", "Bartlett", "Blood", "Seville", "Mandarin", "Bergamot"),
Position = c(1.1, 1.2, 1.3, 2.1, 2.2, 3.1, 3.2, 3.3, 3.4))
As you can see, the whole number value of the identifier denotes Fruit
, while the decimal value denotes the Variety
.
I'd like a method to create these unique position identifiers for use in another project (using ganttrify, an excellent but particular function). Bonus points for a dplyr-friendly solution.
CodePudding user response:
We can use match
to create the first index based on 'Fruit' and rowid
(from data.table
) to get the second sequence index and paste them in sprintf
library(dplyr)
library(data.table)
df1 %>%
mutate(Position2 = sprintf('%d.%d', match(Fruit, unique(Fruit)), rowid(Fruit)))
-output
# A tibble: 9 × 4
Fruit Variety Position Position2
<chr> <chr> <dbl> <chr>
1 Apple Gala 1.1 1.1
2 Apple Envy 1.2 1.2
3 Apple Pink Lady 1.3 1.3
4 Pear Anjou 2.1 2.1
5 Pear Bartlett 2.2 2.2
6 Orange Blood 3.1 3.1
7 Orange Seville 3.2 3.2
8 Orange Mandarin 3.3 3.3
9 Orange Bergamot 3.4 3.4
CodePudding user response:
A base R option using paste0
match
ave
> transform(df, position2 = paste0(match(Fruit,unique(Fruit)), ".", ave(Variety, Fruit, FUN = seq_along)))
Fruit Variety Position position2
1 Apple Gala 1.1 1.1
2 Apple Envy 1.2 1.2
3 Apple Pink Lady 1.3 1.3
4 Pear Anjou 2.1 2.1
5 Pear Bartlett 2.2 2.2
6 Orange Blood 3.1 3.1
7 Orange Seville 3.2 3.2
8 Orange Mandarin 3.3 3.3
9 Orange Bergamot 3.4 3.4
CodePudding user response:
Here is dplyr
way using group_indices
. The trick is to overcome the alphabetical ordering of group_indices
with the code below:
library(dplyr)
df %>%
mutate(id = group_indices(., factor(Fruit, levels = unique(Fruit)))) %>%
group_by(Fruit) %>%
mutate(Position2 = paste(id, row_number(), sep = "."), .keep="unused") %>%
ungroup()
Fruit Variety Position Position2
<chr> <chr> <dbl> <chr>
1 Apple Gala 1.1 1.1
2 Apple Envy 1.2 1.2
3 Apple Pink Lady 1.3 1.3
4 Pear Anjou 2.1 2.1
5 Pear Bartlett 2.2 2.2
6 Orange Blood 3.1 3.1
7 Orange Seville 3.2 3.2
8 Orange Mandarin 3.3 3.3
9 Orange Bergamot 3.4 3.4
CodePudding user response:
Take a look at rank. Look at the ties.method argument. You may want to use ties.method="first".