Home > OS >  How can a list of numerical identifiers be added to a dataframe in R?
How can a list of numerical identifiers be added to a dataframe in R?

Time:06-02

Hello and welcome to my question.

I want to create a list of unique, numerical, position-based identifiers to denote the position of items in a column of values in a dataframe.

Here's an example of what I want (in column Position):

tibble(Fruit = c(rep("Apple", 3), rep("Pear", 2), rep("Orange", 4)),
       Variety = c("Gala", "Envy", "Pink Lady", "Anjou", "Bartlett", "Blood", "Seville", "Mandarin", "Bergamot"),
       Position = c(1.1, 1.2, 1.3, 2.1, 2.2, 3.1, 3.2, 3.3, 3.4))

As you can see, the whole number value of the identifier denotes Fruit, while the decimal value denotes the Variety.

I'd like a method to create these unique position identifiers for use in another project (using ganttrify, an excellent but particular function). Bonus points for a dplyr-friendly solution.

CodePudding user response:

We can use match to create the first index based on 'Fruit' and rowid (from data.table) to get the second sequence index and paste them in sprintf

library(dplyr)
library(data.table)
df1 %>%
   mutate(Position2 = sprintf('%d.%d', match(Fruit, unique(Fruit)), rowid(Fruit)))

-output

# A tibble: 9 × 4
  Fruit  Variety   Position Position2
  <chr>  <chr>        <dbl> <chr>    
1 Apple  Gala           1.1 1.1      
2 Apple  Envy           1.2 1.2      
3 Apple  Pink Lady      1.3 1.3      
4 Pear   Anjou          2.1 2.1      
5 Pear   Bartlett       2.2 2.2      
6 Orange Blood          3.1 3.1      
7 Orange Seville        3.2 3.2      
8 Orange Mandarin       3.3 3.3      
9 Orange Bergamot       3.4 3.4  

CodePudding user response:

A base R option using paste0 match ave

> transform(df, position2 = paste0(match(Fruit,unique(Fruit)), ".", ave(Variety, Fruit, FUN = seq_along)))
   Fruit   Variety Position position2
1  Apple      Gala      1.1       1.1
2  Apple      Envy      1.2       1.2
3  Apple Pink Lady      1.3       1.3
4   Pear     Anjou      2.1       2.1
5   Pear  Bartlett      2.2       2.2
6 Orange     Blood      3.1       3.1
7 Orange   Seville      3.2       3.2
8 Orange  Mandarin      3.3       3.3
9 Orange  Bergamot      3.4       3.4

CodePudding user response:

Here is dplyr way using group_indices. The trick is to overcome the alphabetical ordering of group_indices with the code below:

library(dplyr)

df %>% 
  mutate(id = group_indices(., factor(Fruit, levels = unique(Fruit)))) %>% 
  group_by(Fruit) %>% 
  mutate(Position2 = paste(id, row_number(), sep = "."), .keep="unused") %>%
  ungroup()
Fruit  Variety   Position Position2
  <chr>  <chr>        <dbl> <chr>    
1 Apple  Gala           1.1 1.1      
2 Apple  Envy           1.2 1.2      
3 Apple  Pink Lady      1.3 1.3      
4 Pear   Anjou          2.1 2.1      
5 Pear   Bartlett       2.2 2.2      
6 Orange Blood          3.1 3.1      
7 Orange Seville        3.2 3.2      
8 Orange Mandarin       3.3 3.3      
9 Orange Bergamot       3.4 3.4      

CodePudding user response:

Take a look at rank. Look at the ties.method argument. You may want to use ties.method="first".

  • Related