Home > Enterprise >  Recode two non sequential indices to the same values
Recode two non sequential indices to the same values

Time:11-03

I'm working with a data.frame that contains two columns, sample_id1 and sample_id2. Both columns contain non sequential values between 1 and 403. I would like to recode both sample_id1 and sample_id2 so that the values become sequential.

Example:

1 7 9 10 14 15   becomes 1 2 3  4  5  6

15 14 10 9 7 1   becomes 6 5 4  3  2  1 

Example data:

df_test<-structure(list(sample_id1 = c(284L, 313L, 319L, 359L, 196L, 358L, 
378L, 277L, 206L, 385L, 356L, 291L, 357L, 375L, 140L, 231L, 386L, 
387L, 94L, 94L, 211L, 335L, 311L, 323L, 378L, 356L, 227L, 196L, 
376L, 374L, 392L, 399L, 244L, 346L, 319L, 397L, 358L, 279L, 156L, 
375L, 374L, 322L, 258L, 292L, 193L, 339L, 396L, 377L, 326L, 120L, 
259L, 234L, 105L, 263L, 306L, 94L, 231L, 221L, 279L, 320L, 318L, 
357L, 325L, 266L, 230L, 263L, 352L, 327L, 156L, 378L, 371L, 108L, 
366L, 277L, 387L, 257L, 274L, 375L, 327L, 105L, 108L, 373L, 313L, 
316L, 379L, 43L, 210L, 126L, 378L, 290L, 54L, 138L, 222L, 290L, 
278L, 397L, 357L, 88L, 278L, 196L, 199L, 317L, 89L, 379L, 198L, 
218L, 110L, 310L, 373L, 269L, 272L, 140L, 399L, 392L, 248L, 278L, 
389L, 357L, 311L, 281L, 267L, 85L, 126L, 365L, 259L, 211L, 31L, 
276L, 329L, 263L, 370L, 324L, 367L, 255L, 170L, 341L, 119L, 373L, 
359L, 195L, 373L, 93L, 384L, 193L, 138L, 227L, 45L, 273L, 267L, 
316L, 369L, 243L, 339L, 299L, 271L, 278L, 321L, 381L, 322L, 333L, 
201L, 282L, 319L, 357L, 261L, 246L, 310L, 264L, 376L, 376L, 318L, 
377L, 133L, 76L, 357L, 379L, 214L, 93L, 364L, 399L, 382L, 253L, 
222L, 356L, 403L, 156L, 321L, 133L, 293L, 365L, 292L, 367L, 271L, 
366L, 107L, 271L, 249L, 195L, 310L, 282L, 349L, 374L, 249L, 386L, 
357L, 252L, 281L, 372L, 113L, 258L, 303L, 378L, 322L, 377L, 371L, 
275L, 95L, 241L, 375L, 287L, 318L, 161L, 107L, 272L, 111L, 397L, 
358L, 383L, 313L, 111L, 381L, 97L, 355L, 316L, 319L, 317L, 303L, 
398L, 392L, 375L, 106L, 356L, 271L, 283L, 54L, 179L, 305L, 99L, 
387L, 187L, 248L, 313L, 185L, 377L, 372L, 202L, 291L, 378L, 50L, 
111L, 277L, 378L, 332L, 377L, 309L, 276L, 145L, 378L, 263L, 340L, 
73L, 141L, 266L, 364L, 377L, 403L, 244L, 263L, 72L, 122L, 339L, 
243L, 152L, 172L, 320L, 249L, 378L, 381L, 274L, 399L, 297L, 306L, 
161L, 271L, 319L, 94L, 125L, 187L, 138L, 287L, 329L, 183L, 246L, 
246L, 195L, 302L, 376L, 290L, 378L, 169L, 255L, 234L, 144L, 380L, 
357L, 269L, 377L, 145L, 266L, 397L, 390L, 115L, 246L, 250L, 347L, 
357L, 151L, 273L, 56L, 372L, 325L, 95L, 373L, 313L, 319L, 267L, 
319L, 231L, 327L, 279L, 379L, 376L, 214L, 86L, 94L, 322L, 76L, 
378L, 320L, 381L, 255L, 263L, 124L, 381L, 377L, 282L, 285L, 99L, 
238L, 161L, 329L, 379L, 241L, 340L, 279L, 374L), sample_id2 = c(56L, 
218L, 246L, 120L, 15L, 193L, 375L, 102L, 114L, 91L, 135L, 2L, 
316L, 349L, 65L, 167L, 378L, 127L, 15L, 91L, 128L, 1L, 246L, 
291L, 376L, 29L, 135L, 99L, 170L, 359L, 287L, 135L, 31L, 264L, 
138L, 40L, 13L, 124L, 147L, 102L, 364L, 179L, 76L, 291L, 44L, 
241L, 163L, 277L, 325L, 119L, 86L, 93L, 41L, 174L, 170L, 64L, 
160L, 114L, 12L, 42L, 73L, 275L, 193L, 115L, 151L, 63L, 239L, 
76L, 130L, 8L, 350L, 10L, 277L, 29L, 163L, 31L, 116L, 135L, 243L, 
73L, 90L, 123L, 166L, 81L, 56L, 29L, 118L, 10L, 303L, 235L, 34L, 
106L, 3L, 255L, 104L, 102L, 161L, 83L, 249L, 124L, 154L, 161L, 
36L, 267L, 190L, 166L, 109L, 106L, 151L, 190L, 221L, 90L, 111L, 
44L, 163L, 83L, 134L, 317L, 310L, 179L, 45L, 66L, 101L, 47L, 
94L, 25L, 21L, 275L, 281L, 250L, 249L, 162L, 241L, 172L, 47L, 
199L, 106L, 104L, 282L, 92L, 47L, 76L, 4L, 52L, 81L, 29L, 6L, 
209L, 97L, 8L, 299L, 66L, 281L, 199L, 214L, 151L, 201L, 279L, 
22L, 23L, 83L, 275L, 106L, 119L, 68L, 135L, 42L, 28L, 107L, 375L, 
316L, 372L, 9L, 48L, 107L, 99L, 36L, 56L, 316L, 310L, 1L, 24L, 
31L, 268L, 292L, 108L, 106L, 130L, 286L, 245L, 254L, 281L, 15L, 
120L, 43L, 258L, 81L, 89L, 105L, 276L, 303L, 276L, 219L, 376L, 
113L, 31L, 101L, 349L, 81L, 231L, 135L, 365L, 251L, 41L, 308L, 
107L, 34L, 145L, 356L, 223L, 105L, 33L, 105L, 13L, 29L, 306L, 
125L, 126L, 84L, 73L, 94L, 94L, 239L, 151L, 170L, 73L, 185L, 
222L, 338L, 105L, 42L, 83L, 231L, 98L, 36L, 147L, 117L, 97L, 
241L, 92L, 101L, 18L, 170L, 77L, 135L, 28L, 254L, 43L, 45L, 42L, 
104L, 245L, 249L, 135L, 156L, 105L, 20L, 239L, 12L, 313L, 7L, 
22L, 50L, 161L, 376L, 324L, 49L, 85L, 24L, 108L, 24L, 36L, 25L, 
134L, 106L, 73L, 321L, 167L, 148L, 376L, 295L, 185L, 73L, 76L, 
102L, 85L, 30L, 91L, 29L, 44L, 156L, 76L, 132L, 245L, 86L, 161L, 
349L, 172L, 275L, 127L, 235L, 12L, 134L, 89L, 73L, 87L, 354L, 
140L, 231L, 357L, 291L, 91L, 106L, 86L, 89L, 302L, 43L, 272L, 
12L, 151L, 37L, 93L, 106L, 270L, 29L, 15L, 111L, 196L, 266L, 
99L, 259L, 40L, 6L, 45L, 56L, 116L, 6L, 372L, 88L, 174L, 218L, 
231L, 36L, 12L, 359L, 135L, 164L, 92L, 152L, 151L, 116L, 258L, 
122L, 98L, 36L, 106L)), row.names = c(NA, -366L), class = "data.frame")

CodePudding user response:

You can use rank() but that would not deal correctly with repetitions, instead you could use data.table::frank()

library(data.table)
frank(c(1, 7, 9, 10, 14, 15), ties.method = 'dense')
# [1] 1 2 3 4 5 6

frank(c(15, 14, 10, 9, 7, 1), ties.method = 'dense')
# [1] 6 5 4 3 2 1

# Since you have duplicated entries it needs to deal correctly with that:
frank(c(15, 14, 10, 9, 7, 1, 15), ties.method = 'dense')
# [1] 6 5 4 3 2 1 6

If you prefer base R you could use match() sort() unique():

x <- c(15, 14, 10, 9, 7, 1, 15, 16)
match(x, unique(sort(x)))
# [1] 6 5 4 3 2 1 6 7
  • Related