Make a synonym table-CodePudding

So I have a df containing codes and names associated with these codes. Basically, the codes are unique but may have multiple names. So, if you want to left_join() with another table without creating duplicate rows, you need to reorganize the table. I want to do it so that it is grouped by codes and the columns are "name1", "name2" etc.

input df

df<-data.frame(code=c(1,1,2),
                 name=c("a", "b", "c"))

result df

resdf<-data.frame(code=c(1,2),
                  name1=c("a", "c"),
                  name2=c("b",""))

I don't care if the unique values are filled with NAs or ""s or whatever.

I can't find a good solution using aggregate etc. and am stumped.

CodePudding user response：

You can pivot_wider after creating a variable that identifies the sequential numbers for each code.

library(tidyr)
library(dplyr)

df |>
  group_by(code) |> 
  mutate(sequential = row_number()) |> 
  pivot_wider(names_prefix = "name",
              names_from = "sequential",
              values_from = "name")

#> # A tibble: 2 × 3
#> # Groups:   code [2]
#>    code name1 name2
#>   <dbl> <chr> <chr>
#> 1     1 a     b    
#> 2     2 c     <NA>

^{Created on 2022-07-07 by the reprex package (v2.0.1)}

CodePudding user response：

Without external packages, you could use reshape().

reshape(transform(df, id = ave(code, code, FUN = seq_along)),
        direction = "wide", idvar = "code", timevar = "id", v.names = "name")

#   code name.1 name.2
# 1    1      a      b
# 3    2      c   <NA>

CodePudding user response：

Another way of solving your problem based on data.table package:

library(data.table)

mx = max(setDT(df)[, .N, code]$N)

df[, setNames(as.list(name)[1:mx], paste0("name", 1:mx)), code]

    code  name1  name2
1:     1      a      b
2:     2      c   <NA>