How to swap the column and row entries in R-CodePudding

library(data.table)
dat1 <- data.table(id = c(1, 2, 34, 99),
           class = c("sports", "", "music, sports", ""),
           hobby = c("knitting, music, sports", "", "", "music"))
> dat1
  id         class                   hobby
1  1        sports knitting, music, sports
2  2                                      
3 34 music, sports                        
4 99                                 music

I have the above dataset, dat1, where each row corresponds to a unique id. For each id, multiple inputs for either class or hobby are separated by a comma.

I would like to exchange the row and column of this dataset so that I get the following:

     input class hobby
1   sports 1, 34     1
2 knitting           1
3    music    34 1, 99

In this dataset, each row corresponds to a unique input from dat1. Now the class and hobby columns are storing the corresponding ids from dat1, each separated by a comma.

Is there a quick way to swap the row and columns like this in R?

CodePudding user response：

Here is an (not that long) tidyverse option. The issue is that values for the future 'input` column are scattered into multiple columns and can contain several value into the same string.

First a pivot_longer to have all the future input into the same column:

dat2 = dat1 %>% pivot_longer(!id)
dat2
# A tibble: 8 x 3
     id name  value                    
  <dbl> <chr> <chr>                    
1     1 class "sports"                 
2     1 hobby "knitting, music, sports"
3     2 class ""                       
4     2 hobby ""                       
5    34 class "music, sports"          
6    34 hobby ""                       
7    99 class ""                       
8    99 hobby "music"

Then the approach is to use the stringr package to split all the potential input values to then pivoting again to have all combination. Then filtering for the row having an input (nchar(input)>0). Finally grouping by the input and name to paste the multiple id together id = paste0(id, collapse = ","). And a last pivot_wider to have the data frame as desired:

cbind(dat2, str_split(str_remove_all(dat2$value, ","), " ", simplify = T)) %>%
  select(!value) %>% 
  pivot_longer(!c("id", "name"), names_to = "name_2", values_to = "input") %>% 
  select(!name_2) %>% 
  filter(nchar(input)>0) %>%
  group_by(input, name) %>% 
  summarise(id = paste0(id, collapse = ",")) %>% 
  pivot_wider(names_from = name, values_from = id)

# A tibble: 3 x 3
# Groups:   input [3]
  input    hobby class
  <chr>    <chr> <chr>
1 knitting 1     NA   
2 music    1,99  34   
3 sports   1     1,34

CodePudding user response：

Alternative way, using dplyr, listing the values in the rows by calling strsplit then unnesting it, pivoting longer, back into pivot wider, to reverse the values.

library(dplyr)


dat1 |>
  rowwise() |> 
  mutate(across(-id, ~ ifelse(nchar(.x), strsplit(.x, ", "), list("")))) |> 
  unnest(cols = -id) |> 
  pivot_longer(cols = class:hobby) |> 
  filter(value != "") |> 
  pivot_wider(id_cols = value, names_from = name, values_from = id, 
              values_fn = \(x) paste(unique(x), collapse = " ,"))

  value    class hobby
  <chr>    <chr> <chr>
1 sports   1 ,34 1    
2 knitting NA    1    
3 music    34    1 ,99