How to assign numbers to names using in R-CodePudding

I have 50 columns of names, but here I have presented only 4 columns for convenience.

Name1       Name2         Name3      Name4
Rose,Ali    Van,Hall      Ghol,Dam   Murr,kate
Camp,Laura  Ka,Klo        Dan,Dan    Ali,Hoss
Rose,Ali    Van,Hall      Ghol,Dam   Kol,Kan
Murr,Kate   Ismal, Ismal  Sian,Rozi  Nas,Ami
Ghol,Dam    Ka,Klo        Rose,Ali   Nor,Ko
Murr,Kate   Ismal, Ismal  Dan,Dan    Nas,Ami

I want to assign numbers to each person based on the columns, a sequence of numbers.

For example, in Name 1, we get the numbers from 1-4. The repeated names will get the same numbers.

In Name 2, it should be started from 5 and so on. This will give me the following table:

Assign1 Assian2 Assian3 Assian4
      1       5       8      12
      2       6       9      13
      1       5       8      14
      3       7      10      15
      4       6      11      17
      3       7       9      15

I would like to have it without a loop, i.e.,sapply,i.e., sapply(dat, function(x) match(x, unique(x))).

Using dplyr or tidyverse would be great.

CodePudding user response：

A tidyverse solution with purrr::accumulate():

library(tidyverse)

df %>%
  mutate(as_tibble(
    accumulate(across(Name1:Name4, ~ match(.x, unique(.x))), ~ .y   max(.x))
  ))

#   Name1 Name2 Name3 Name4
# 1     1     5     8    12
# 2     2     6     9    13
# 3     1     5     8    14
# 4     3     7    10    15
# 5     4     6    11    16
# 6     3     7     9    15

CodePudding user response：

Because the values in each column depend on the values in the previous column, the calculations have to be done sequentially. This is probably most succinctly achieved by a loop. Remember that lapply and sapply are simply loops-in-disguise, and won't be quicker than an explicit loop.

Note that your expected output has a mistake in it (there is a number 17 which should be 16)

output <- setNames(df, paste0('Assign', seq_along(df)))
                   
for(i in seq_along(output)) {
  output[[i]] <- match(output[[i]], unique(output[[i]]))
  if(i > 1) output[[i]] <- output[[i]]   max(output[[i - 1]])
}

output
#>    Assign1  Assign2  Assign3  Assign4
#> 1        1        5        8       12
#> 2        2        6        9       13
#> 3        1        5        8       14
#> 4        3        7       10       15
#> 5        4        6       11       16
#> 6        3        7        9       15

Edit

If you really want it without an explicit loop, you can do:

res <- sapply(seq_along(df), \(i) match(df[[i]], unique(df[[i]]))) 
res   t(replicate(nrow(df), head(c(0, cumsum(apply(res, 2, max))), -1))) |>
  as.data.frame() |>
  setNames(paste0('Assign', seq_along(df)))
#>   Assign1 Assign2 Assign3 Assign4
#> 1       1       5       8      12
#> 2       2       6       9      13
#> 3       1       5       8      14
#> 4       3       7      10      15
#> 5       4       6      11      16
#> 6       3       7       9      15

^{Created on 2023-01-13 with reprex v2.0.2}

Data taken from question in reproducible format

df <- structure(list(Name1 = c("Rose,Ali", "Camp,Laura", "Rose,Ali", 
"Murr,Kate", "Ghol,Dam", "Murr,Kate"), Name2 = c("Van,Hall", 
"Ka,Klo", "Van,Hall", "Ismal, Ismal", "Ka,Klo", "Ismal, Ismal"
), Name3 = c("Ghol,Dam", "Dan,Dan", "Ghol,Dam", "Sian,Rozi", 
"Rose,Ali", "Dan,Dan"), Name4 = c("Murr,kate", "Ali,Hoss", "Kol,Kan", 
"Nas,Ami", "Nor,Ko", "Nas,Ami")), row.names = c(NA, -6L), 
class = "data.frame")

CodePudding user response：

Here is a tidyverse approach:

First paste the column name after each of the strings in all your columns, for sorting purpose later. Then pivot it into a two-column df so that we can assign ID to them by match. Finally pivot it back to a wide format and unnest the list columns.

library(tidyverse)

df %>% 
  mutate(across(everything(), ~ paste0(.x, "_", cur_column()))) %>% 
  pivot_longer(everything(), names_to = "ab", values_to = "a") %>% 
  arrange(ab) %>% 
  mutate(b = match(a, unique(a)), .keep = "unused") %>% 
  pivot_wider(names_from = "ab", values_from = "b") %>% 
  unnest(everything())

# A tibble: 6 × 4
  Name1 Name2 Name3 Name4
  <int> <int> <int> <int>
1     1     5     8    12
2     2     6     9    13
3     1     5     8    14
4     3     7    10    15
5     4     6    11    16
6     3     7     9    15

Data

Taken from @Allan Cameron.

df <- structure(list(Name1 = c("Rose,Ali", "Camp,Laura", "Rose,Ali", 
"Murr,Kate", "Ghol,Dam", "Murr,Kate"), Name2 = c("Van,Hall", 
"Ka,Klo", "Van,Hall", "Ismal, Ismal", "Ka,Klo", "Ismal, Ismal"
), Name3 = c("Ghol,Dam", "Dan,Dan", "Ghol,Dam", "Sian,Rozi", 
"Rose,Ali", "Dan,Dan"), Name4 = c("Murr,kate", "Ali,Hoss", "Kol,Kan", 
"Nas,Ami", "Nor,Ko", "Nas,Ami")), row.names = c(NA, -6L), 
class = "data.frame")

CodePudding user response：

Update: The approach below is not ideal because ID's are not unique. Sorry.

Using a lookup table with tidyverse:

library(dplyr)
library(tidyr)

lookup <-
  df |> 
  pivot_longer(everything()) |>
  distinct() |>
  arrange(name) |>
  transmute(name = value, value = row_number()) |>
  deframe()

df |>
  mutate(across(everything(), ~ recode(., !!!lookup)))

Output:

  Name1 Name2 Name3 Name4
1     1     5     4    12
2     2     6     9    13
3     1     5     4    14
4     3     7    10    15
5     4     6     1    16
6     3     7     9    15

Data from @Allan Cameron, thanks.

CodePudding user response：

A shorter way could be:

colnames(df) <- map(seq(ncol(df)), function(n) paste0('assign', n))