Home > Back-end >  How to add a common number to rows that have same value in another column?
How to add a common number to rows that have same value in another column?

Time:11-16

After years of using your advices to another users, here is my for now unsolvable issue...

I have a dataset with thousands of rows and hundreds of column, that have one column with a possible value in common. Here is a subset of my dataset :

ID <- c("A", "B", "C", "D", "E")
Dose <- c("1", "5", "3", "4", "5")
Value <- c("x1", "x2", "x3", "x2", "x3")
mat <- cbind(ID, Dose, Value)

What I want is to assign a unique value to the rows that have the "Value" column in common, like that :

ID <- c("A", "B", "C", "D", "E")
Dose <- c("1", "5", "3", "4", "5")
Value <- c("153254", "258634", "896411", "258634", "896411")
Code <- c("1", "2", "3", "2", "3")
mat <- cbind(ID, Dose, Value, Code)

Does anyone have an idea that could help me a little ?

Thanks !

CodePudding user response:

You should consider using a data.frame:

mat <- data.frame(ID, Dose, Value)

Using dplyr you could create the desired output:

library(dplyr)

mat %>% 
  group_by(Value) %>% 
  mutate(Code = cur_group_id()) %>% 
  ungroup()

This returns

# A tibble: 5 x 4
  ID    Dose  Value   Code
  <chr> <chr> <chr>  <int>
1 A     1     153254     1
2 B     5     258634     2
3 C     3     896411     3
4 D     4     258634     2
5 E     5     896411     3

CodePudding user response:

We may use match here

library(dplyr)
mat %>% 
    mutate(Code = match(Value, unique(Value)))

-output

 ID Dose  Value Code
1  A    1 153254    1
2  B    5 258634    2
3  C    3 896411    3
4  D    4 258634    2
5  E    5 896411    3

data

mat <- data.frame(ID, Dose, Value)

CodePudding user response:

To generate unique values, we could use a hash function. Here is one approach using the fst package, which implements xxHash. The benefit of is that the values are nicely spaced out, probability for collisions is extremely low, while still being very fast. When data reaches a few million different groups, [1] should be removed to make use of 64-bit key.

ID <- c("A", "B", "C", "D", "E")
Dose <- c("1", "5", "3", "4", "5")
Value <- c("x1", "x2", "x3", "x2", "x3")
mat <- cbind(ID, Dose, Value)

mat[,"Value"] <- 
  lapply(mat[,"Value"], charToRaw) |>
  lapply(\(x) fst::hash_fst(x, block_hash = F)[1]) |>
  unlist(use.names = F)

     ID  Dose Value       
[1,] "A" "1"  "1212139790"
[2,] "B" "5"  "1379455937"
[3,] "C" "3"  "756640974" 
[4,] "D" "4"  "1379455937"
[5,] "E" "5"  "756640974" 
  •  Tags:  
  • r
  • Related