I have 50 columns of names, but here I have presented only 4 columns for convenience.
Name1 Name2 Name3 Name4
Rose,Ali Van,Hall Ghol,Dam Murr,kate
Camp,Laura Ka,Klo Dan,Dan Ali,Hoss
Rose,Ali Van,Hall Ghol,Dam Kol,Kan
Murr,Kate Ismal, Ismal Sian,Rozi Nas,Ami
Ghol,Dam Ka,Klo Rose,Ali Nor,Ko
Murr,Kate Ismal, Ismal Dan,Dan Nas,Ami
I want to assign numbers to each person based on the columns, a sequence of numbers.
For example, in Name 1, we get the numbers from 1-4. The repeated names will get the same numbers.
In Name 2, it should be started from 5 and so on. This will give me the following table:
Assign1 Assian2 Assian3 Assian4
1 5 8 12
2 6 9 13
1 5 8 14
3 7 10 15
4 6 11 17
3 7 9 15
I would like to have it without a loop, i.e.,sapply
,i.e., sapply(dat, function(x) match(x, unique(x)))
.
Using dplyr or tidyverse would be great.
CodePudding user response:
A tidyverse
solution with purrr::accumulate()
:
library(tidyverse)
df %>%
mutate(as_tibble(
accumulate(across(Name1:Name4, ~ match(.x, unique(.x))), ~ .y max(.x))
))
# Name1 Name2 Name3 Name4
# 1 1 5 8 12
# 2 2 6 9 13
# 3 1 5 8 14
# 4 3 7 10 15
# 5 4 6 11 16
# 6 3 7 9 15
CodePudding user response:
Because the values in each column depend on the values in the previous column, the calculations have to be done sequentially. This is probably most succinctly achieved by a loop. Remember that lapply
and sapply
are simply loops-in-disguise, and won't be quicker than an explicit loop.
Note that your expected output has a mistake in it (there is a number 17 which should be 16)
output <- setNames(df, paste0('Assign', seq_along(df)))
for(i in seq_along(output)) {
output[[i]] <- match(output[[i]], unique(output[[i]]))
if(i > 1) output[[i]] <- output[[i]] max(output[[i - 1]])
}
output
#> Assign1 Assign2 Assign3 Assign4
#> 1 1 5 8 12
#> 2 2 6 9 13
#> 3 1 5 8 14
#> 4 3 7 10 15
#> 5 4 6 11 16
#> 6 3 7 9 15
Edit
If you really want it without an explicit loop, you can do:
res <- sapply(seq_along(df), \(i) match(df[[i]], unique(df[[i]])))
res t(replicate(nrow(df), head(c(0, cumsum(apply(res, 2, max))), -1))) |>
as.data.frame() |>
setNames(paste0('Assign', seq_along(df)))
#> Assign1 Assign2 Assign3 Assign4
#> 1 1 5 8 12
#> 2 2 6 9 13
#> 3 1 5 8 14
#> 4 3 7 10 15
#> 5 4 6 11 16
#> 6 3 7 9 15
Created on 2023-01-13 with reprex v2.0.2
Data taken from question in reproducible format
df <- structure(list(Name1 = c("Rose,Ali", "Camp,Laura", "Rose,Ali",
"Murr,Kate", "Ghol,Dam", "Murr,Kate"), Name2 = c("Van,Hall",
"Ka,Klo", "Van,Hall", "Ismal, Ismal", "Ka,Klo", "Ismal, Ismal"
), Name3 = c("Ghol,Dam", "Dan,Dan", "Ghol,Dam", "Sian,Rozi",
"Rose,Ali", "Dan,Dan"), Name4 = c("Murr,kate", "Ali,Hoss", "Kol,Kan",
"Nas,Ami", "Nor,Ko", "Nas,Ami")), row.names = c(NA, -6L),
class = "data.frame")
CodePudding user response:
Here is a tidyverse
approach:
First paste
the column name after each of the strings in all your columns, for sorting purpose later. Then pivot
it into a two-column df so that we can assign ID to them by match
. Finally pivot
it back to a wide format and unnest the list columns.
library(tidyverse)
df %>%
mutate(across(everything(), ~ paste0(.x, "_", cur_column()))) %>%
pivot_longer(everything(), names_to = "ab", values_to = "a") %>%
arrange(ab) %>%
mutate(b = match(a, unique(a)), .keep = "unused") %>%
pivot_wider(names_from = "ab", values_from = "b") %>%
unnest(everything())
# A tibble: 6 × 4
Name1 Name2 Name3 Name4
<int> <int> <int> <int>
1 1 5 8 12
2 2 6 9 13
3 1 5 8 14
4 3 7 10 15
5 4 6 11 16
6 3 7 9 15
Data
Taken from @Allan Cameron.
df <- structure(list(Name1 = c("Rose,Ali", "Camp,Laura", "Rose,Ali",
"Murr,Kate", "Ghol,Dam", "Murr,Kate"), Name2 = c("Van,Hall",
"Ka,Klo", "Van,Hall", "Ismal, Ismal", "Ka,Klo", "Ismal, Ismal"
), Name3 = c("Ghol,Dam", "Dan,Dan", "Ghol,Dam", "Sian,Rozi",
"Rose,Ali", "Dan,Dan"), Name4 = c("Murr,kate", "Ali,Hoss", "Kol,Kan",
"Nas,Ami", "Nor,Ko", "Nas,Ami")), row.names = c(NA, -6L),
class = "data.frame")
CodePudding user response:
Update: The approach below is not ideal because ID's are not unique. Sorry.
Using a lookup table with tidyverse
:
library(dplyr)
library(tidyr)
lookup <-
df |>
pivot_longer(everything()) |>
distinct() |>
arrange(name) |>
transmute(name = value, value = row_number()) |>
deframe()
df |>
mutate(across(everything(), ~ recode(., !!!lookup)))
Output:
Name1 Name2 Name3 Name4
1 1 5 4 12
2 2 6 9 13
3 1 5 4 14
4 3 7 10 15
5 4 6 1 16
6 3 7 9 15
Data from @Allan Cameron, thanks.
CodePudding user response:
A shorter way could be:
colnames(df) <- map(seq(ncol(df)), function(n) paste0('assign', n))