I'm working with a data.frame
that contains two columns, sample_id1
and sample_id2
. Both columns contain non sequential values between 1 and 403. I would like to recode both sample_id1
and sample_id2
so that the values become sequential.
Example:
1 7 9 10 14 15 becomes 1 2 3 4 5 6
15 14 10 9 7 1 becomes 6 5 4 3 2 1
Example data:
df_test<-structure(list(sample_id1 = c(284L, 313L, 319L, 359L, 196L, 358L,
378L, 277L, 206L, 385L, 356L, 291L, 357L, 375L, 140L, 231L, 386L,
387L, 94L, 94L, 211L, 335L, 311L, 323L, 378L, 356L, 227L, 196L,
376L, 374L, 392L, 399L, 244L, 346L, 319L, 397L, 358L, 279L, 156L,
375L, 374L, 322L, 258L, 292L, 193L, 339L, 396L, 377L, 326L, 120L,
259L, 234L, 105L, 263L, 306L, 94L, 231L, 221L, 279L, 320L, 318L,
357L, 325L, 266L, 230L, 263L, 352L, 327L, 156L, 378L, 371L, 108L,
366L, 277L, 387L, 257L, 274L, 375L, 327L, 105L, 108L, 373L, 313L,
316L, 379L, 43L, 210L, 126L, 378L, 290L, 54L, 138L, 222L, 290L,
278L, 397L, 357L, 88L, 278L, 196L, 199L, 317L, 89L, 379L, 198L,
218L, 110L, 310L, 373L, 269L, 272L, 140L, 399L, 392L, 248L, 278L,
389L, 357L, 311L, 281L, 267L, 85L, 126L, 365L, 259L, 211L, 31L,
276L, 329L, 263L, 370L, 324L, 367L, 255L, 170L, 341L, 119L, 373L,
359L, 195L, 373L, 93L, 384L, 193L, 138L, 227L, 45L, 273L, 267L,
316L, 369L, 243L, 339L, 299L, 271L, 278L, 321L, 381L, 322L, 333L,
201L, 282L, 319L, 357L, 261L, 246L, 310L, 264L, 376L, 376L, 318L,
377L, 133L, 76L, 357L, 379L, 214L, 93L, 364L, 399L, 382L, 253L,
222L, 356L, 403L, 156L, 321L, 133L, 293L, 365L, 292L, 367L, 271L,
366L, 107L, 271L, 249L, 195L, 310L, 282L, 349L, 374L, 249L, 386L,
357L, 252L, 281L, 372L, 113L, 258L, 303L, 378L, 322L, 377L, 371L,
275L, 95L, 241L, 375L, 287L, 318L, 161L, 107L, 272L, 111L, 397L,
358L, 383L, 313L, 111L, 381L, 97L, 355L, 316L, 319L, 317L, 303L,
398L, 392L, 375L, 106L, 356L, 271L, 283L, 54L, 179L, 305L, 99L,
387L, 187L, 248L, 313L, 185L, 377L, 372L, 202L, 291L, 378L, 50L,
111L, 277L, 378L, 332L, 377L, 309L, 276L, 145L, 378L, 263L, 340L,
73L, 141L, 266L, 364L, 377L, 403L, 244L, 263L, 72L, 122L, 339L,
243L, 152L, 172L, 320L, 249L, 378L, 381L, 274L, 399L, 297L, 306L,
161L, 271L, 319L, 94L, 125L, 187L, 138L, 287L, 329L, 183L, 246L,
246L, 195L, 302L, 376L, 290L, 378L, 169L, 255L, 234L, 144L, 380L,
357L, 269L, 377L, 145L, 266L, 397L, 390L, 115L, 246L, 250L, 347L,
357L, 151L, 273L, 56L, 372L, 325L, 95L, 373L, 313L, 319L, 267L,
319L, 231L, 327L, 279L, 379L, 376L, 214L, 86L, 94L, 322L, 76L,
378L, 320L, 381L, 255L, 263L, 124L, 381L, 377L, 282L, 285L, 99L,
238L, 161L, 329L, 379L, 241L, 340L, 279L, 374L), sample_id2 = c(56L,
218L, 246L, 120L, 15L, 193L, 375L, 102L, 114L, 91L, 135L, 2L,
316L, 349L, 65L, 167L, 378L, 127L, 15L, 91L, 128L, 1L, 246L,
291L, 376L, 29L, 135L, 99L, 170L, 359L, 287L, 135L, 31L, 264L,
138L, 40L, 13L, 124L, 147L, 102L, 364L, 179L, 76L, 291L, 44L,
241L, 163L, 277L, 325L, 119L, 86L, 93L, 41L, 174L, 170L, 64L,
160L, 114L, 12L, 42L, 73L, 275L, 193L, 115L, 151L, 63L, 239L,
76L, 130L, 8L, 350L, 10L, 277L, 29L, 163L, 31L, 116L, 135L, 243L,
73L, 90L, 123L, 166L, 81L, 56L, 29L, 118L, 10L, 303L, 235L, 34L,
106L, 3L, 255L, 104L, 102L, 161L, 83L, 249L, 124L, 154L, 161L,
36L, 267L, 190L, 166L, 109L, 106L, 151L, 190L, 221L, 90L, 111L,
44L, 163L, 83L, 134L, 317L, 310L, 179L, 45L, 66L, 101L, 47L,
94L, 25L, 21L, 275L, 281L, 250L, 249L, 162L, 241L, 172L, 47L,
199L, 106L, 104L, 282L, 92L, 47L, 76L, 4L, 52L, 81L, 29L, 6L,
209L, 97L, 8L, 299L, 66L, 281L, 199L, 214L, 151L, 201L, 279L,
22L, 23L, 83L, 275L, 106L, 119L, 68L, 135L, 42L, 28L, 107L, 375L,
316L, 372L, 9L, 48L, 107L, 99L, 36L, 56L, 316L, 310L, 1L, 24L,
31L, 268L, 292L, 108L, 106L, 130L, 286L, 245L, 254L, 281L, 15L,
120L, 43L, 258L, 81L, 89L, 105L, 276L, 303L, 276L, 219L, 376L,
113L, 31L, 101L, 349L, 81L, 231L, 135L, 365L, 251L, 41L, 308L,
107L, 34L, 145L, 356L, 223L, 105L, 33L, 105L, 13L, 29L, 306L,
125L, 126L, 84L, 73L, 94L, 94L, 239L, 151L, 170L, 73L, 185L,
222L, 338L, 105L, 42L, 83L, 231L, 98L, 36L, 147L, 117L, 97L,
241L, 92L, 101L, 18L, 170L, 77L, 135L, 28L, 254L, 43L, 45L, 42L,
104L, 245L, 249L, 135L, 156L, 105L, 20L, 239L, 12L, 313L, 7L,
22L, 50L, 161L, 376L, 324L, 49L, 85L, 24L, 108L, 24L, 36L, 25L,
134L, 106L, 73L, 321L, 167L, 148L, 376L, 295L, 185L, 73L, 76L,
102L, 85L, 30L, 91L, 29L, 44L, 156L, 76L, 132L, 245L, 86L, 161L,
349L, 172L, 275L, 127L, 235L, 12L, 134L, 89L, 73L, 87L, 354L,
140L, 231L, 357L, 291L, 91L, 106L, 86L, 89L, 302L, 43L, 272L,
12L, 151L, 37L, 93L, 106L, 270L, 29L, 15L, 111L, 196L, 266L,
99L, 259L, 40L, 6L, 45L, 56L, 116L, 6L, 372L, 88L, 174L, 218L,
231L, 36L, 12L, 359L, 135L, 164L, 92L, 152L, 151L, 116L, 258L,
122L, 98L, 36L, 106L)), row.names = c(NA, -366L), class = "data.frame")
CodePudding user response:
You can use rank()
but that would not deal correctly with repetitions, instead you could use data.table::frank()
library(data.table)
frank(c(1, 7, 9, 10, 14, 15), ties.method = 'dense')
# [1] 1 2 3 4 5 6
frank(c(15, 14, 10, 9, 7, 1), ties.method = 'dense')
# [1] 6 5 4 3 2 1
# Since you have duplicated entries it needs to deal correctly with that:
frank(c(15, 14, 10, 9, 7, 1, 15), ties.method = 'dense')
# [1] 6 5 4 3 2 1 6
If you prefer base R you could use match()
sort()
unique()
:
x <- c(15, 14, 10, 9, 7, 1, 15, 16)
match(x, unique(sort(x)))
# [1] 6 5 4 3 2 1 6 7