Pivot wide df but the values_from are not equal R-CodePudding

I have a df with ~10000obs, 24variables. A sample of that df would be:

A <- c("EA32", "EA32", "EA32", "EA67", "EA67", "EA67", "EA67",
       "EA34", "EA34", "EA34", "EA69", "EA69", "EA69", "EA69")
B <- c("c1", "c2", "c3", "c1", "c2", "c1", "c2",
       "c2", "c2", "c1", "c2", "c2", "c3", "c3")
C <- c("USB", "UA", "UA", "UD", "UE", "USB", "UA",
       "UA", "USB", "UZ", "UA", "UE", "USB", "UE")
df <- data.frame(A,B,C)

I would love to generate a df2 that looks like this:

B    EA32   EA67  EA34  EA69
c1   USB    USB   NA    NA
c1   NA     NA    UZ    NA
c1   NA     UD    NA    NA
c2   UA     UA    UA    UA
c2   NA     UE    NA    UE
c2   NA     NA    USB   NA
c3   UA     NA    NA    NA
c3   NA     NA    NA    USB
c3   NA     NA    NA    UE

I have tried:

df2 <- df %>% pivot_wider(names_from = A, values_from = C)

but that give me three rows and 5 columns with the data from df$C grouped into c("","") which I then try to separate using separate_rows:

df2 <- df %>% tidyr::separate_rows(df, EA32, sep = ",") 
df2 <- df2 %>% tidyr::separate_rows(df, EA67, sep = ",") ...

but I have to write each line of code for each column and with a total of 24 columns that becomes very tedious and does not generate the correct output of df2 which I need. Any help coming up with a code to go from df to df2 would be greatly appreciated.

Thanks!

CodePudding user response：

Your problem is that for some A values you have duplicate B values, so you first need to creat an additional ID column based on A, then complete the tibble and then you can pivot_wider:

library(tidyverse)
df %>%
  group_by(A) %>%
  mutate(id_col = 1:n()) %>%
  ungroup() %>%
  complete(id_col, A, B) %>%
  pivot_wider(names_from = A, values_from = C) %>%
  select(-id_col)

which gives:

# A tibble: 12 x 5
   B     EA32  EA34  EA67  EA69 
   <chr> <chr> <chr> <chr> <chr>
 1 c1    USB   NA    UD    NA   
 2 c2    NA    UA    NA    UA   
 3 c3    NA    NA    NA    NA   
 4 c1    NA    NA    NA    NA   
 5 c2    UA    USB   UE    UE   
 6 c3    NA    NA    NA    NA   
 7 c1    NA    UZ    USB   NA   
 8 c2    NA    NA    NA    NA   
 9 c3    UA    NA    NA    USB  
10 c1    NA    NA    NA    NA   
11 c2    NA    NA    UA    NA   
12 c3    NA    NA    NA    UE

CodePudding user response：

Use tidyr and dplyr:

library(dplyr)
library(tidyr)
df %>%
  mutate(D=rownames(.)) %>%
  pivot_wider(names_from=A, values_from=C, values_fill=NA) %>%
  arrange(B) %>%
  select(-D)

Output:

# A tibble: 14 x 5
   B     EA32  EA67  EA34  EA69 
   <chr> <chr> <chr> <chr> <chr>
 1 c1    USB   <NA>  <NA>  <NA> 
 2 c1    <NA>  UD    <NA>  <NA> 
 3 c1    <NA>  USB   <NA>  <NA> 
 4 c1    <NA>  <NA>  UZ    <NA> 
 5 c2    UA    <NA>  <NA>  <NA> 
 6 c2    <NA>  UE    <NA>  <NA> 
 7 c2    <NA>  UA    <NA>  <NA> 
 8 c2    <NA>  <NA>  UA    <NA> 
 9 c2    <NA>  <NA>  USB   <NA> 
10 c2    <NA>  <NA>  <NA>  UA   
11 c2    <NA>  <NA>  <NA>  UE   
12 c3    UA    <NA>  <NA>  <NA> 
13 c3    <NA>  <NA>  <NA>  USB  
14 c3    <NA>  <NA>  <NA>  UE

CodePudding user response：

Not quite the order you had, but perhaps workable? It seems that you're looking to group each of the matching B C combinations on their own row.

df %>%
  group_by(B) %>%
  arrange(B, C) %>%
  mutate(row = cumsum(C != lag(C, 1, ""))) %>%
  ungroup() %>%
  pivot_wider( names_from = A, values_from = C)


# A tibble: 9 x 6
  B       row EA67  EA32  EA34  EA69 
  <chr> <int> <chr> <chr> <chr> <chr>
1 c1        1 UD    NA    NA    NA   
2 c1        2 USB   USB   NA    NA   
3 c1        3 NA    NA    UZ    NA   
4 c2        1 UA    UA    UA    UA   
5 c2        2 UE    NA    NA    UE   
6 c2        3 NA    NA    USB   NA   
7 c3        1 NA    UA    NA    NA   
8 c3        2 NA    NA    NA    UE   
9 c3        3 NA    NA    NA    USB

CodePudding user response：

I don't think your output is correct, it can be shrunk further, using reshape2

library(reshape2)
dcast(df,B C~A,value.var="C")

   B   C EA32 EA34 EA67 EA69
1 c1  UD <NA> <NA>   UD <NA>
2 c1 USB  USB <NA>  USB <NA>
3 c1  UZ <NA>   UZ <NA> <NA>
4 c2  UA   UA   UA   UA   UA
5 c2  UE <NA> <NA>   UE   UE
6 c2 USB <NA>  USB <NA> <NA>
7 c3  UA   UA <NA> <NA> <NA>
8 c3  UE <NA> <NA> <NA>   UE
9 c3 USB <NA> <NA> <NA>  USB