I have a df with ~10000obs, 24variables. A sample of that df would be:
A <- c("EA32", "EA32", "EA32", "EA67", "EA67", "EA67", "EA67",
"EA34", "EA34", "EA34", "EA69", "EA69", "EA69", "EA69")
B <- c("c1", "c2", "c3", "c1", "c2", "c1", "c2",
"c2", "c2", "c1", "c2", "c2", "c3", "c3")
C <- c("USB", "UA", "UA", "UD", "UE", "USB", "UA",
"UA", "USB", "UZ", "UA", "UE", "USB", "UE")
df <- data.frame(A,B,C)
I would love to generate a df2 that looks like this:
B EA32 EA67 EA34 EA69
c1 USB USB NA NA
c1 NA NA UZ NA
c1 NA UD NA NA
c2 UA UA UA UA
c2 NA UE NA UE
c2 NA NA USB NA
c3 UA NA NA NA
c3 NA NA NA USB
c3 NA NA NA UE
I have tried:
df2 <- df %>% pivot_wider(names_from = A, values_from = C)
but that give me three rows and 5 columns with the data from df$C grouped into c("","") which I then try to separate using separate_rows:
df2 <- df %>% tidyr::separate_rows(df, EA32, sep = ",")
df2 <- df2 %>% tidyr::separate_rows(df, EA67, sep = ",") ...
but I have to write each line of code for each column and with a total of 24 columns that becomes very tedious and does not generate the correct output of df2 which I need. Any help coming up with a code to go from df to df2 would be greatly appreciated.
Thanks!
CodePudding user response:
Your problem is that for some A
values you have duplicate B
values, so you first need to creat an additional ID column based on A
, then complete
the tibble and then you can pivot_wider
:
library(tidyverse)
df %>%
group_by(A) %>%
mutate(id_col = 1:n()) %>%
ungroup() %>%
complete(id_col, A, B) %>%
pivot_wider(names_from = A, values_from = C) %>%
select(-id_col)
which gives:
# A tibble: 12 x 5
B EA32 EA34 EA67 EA69
<chr> <chr> <chr> <chr> <chr>
1 c1 USB NA UD NA
2 c2 NA UA NA UA
3 c3 NA NA NA NA
4 c1 NA NA NA NA
5 c2 UA USB UE UE
6 c3 NA NA NA NA
7 c1 NA UZ USB NA
8 c2 NA NA NA NA
9 c3 UA NA NA USB
10 c1 NA NA NA NA
11 c2 NA NA UA NA
12 c3 NA NA NA UE
CodePudding user response:
Use tidyr
and dplyr
:
library(dplyr)
library(tidyr)
df %>%
mutate(D=rownames(.)) %>%
pivot_wider(names_from=A, values_from=C, values_fill=NA) %>%
arrange(B) %>%
select(-D)
Output:
# A tibble: 14 x 5
B EA32 EA67 EA34 EA69
<chr> <chr> <chr> <chr> <chr>
1 c1 USB <NA> <NA> <NA>
2 c1 <NA> UD <NA> <NA>
3 c1 <NA> USB <NA> <NA>
4 c1 <NA> <NA> UZ <NA>
5 c2 UA <NA> <NA> <NA>
6 c2 <NA> UE <NA> <NA>
7 c2 <NA> UA <NA> <NA>
8 c2 <NA> <NA> UA <NA>
9 c2 <NA> <NA> USB <NA>
10 c2 <NA> <NA> <NA> UA
11 c2 <NA> <NA> <NA> UE
12 c3 UA <NA> <NA> <NA>
13 c3 <NA> <NA> <NA> USB
14 c3 <NA> <NA> <NA> UE
CodePudding user response:
Not quite the order you had, but perhaps workable? It seems that you're looking to group each of the matching B C combinations on their own row.
df %>%
group_by(B) %>%
arrange(B, C) %>%
mutate(row = cumsum(C != lag(C, 1, ""))) %>%
ungroup() %>%
pivot_wider( names_from = A, values_from = C)
# A tibble: 9 x 6
B row EA67 EA32 EA34 EA69
<chr> <int> <chr> <chr> <chr> <chr>
1 c1 1 UD NA NA NA
2 c1 2 USB USB NA NA
3 c1 3 NA NA UZ NA
4 c2 1 UA UA UA UA
5 c2 2 UE NA NA UE
6 c2 3 NA NA USB NA
7 c3 1 NA UA NA NA
8 c3 2 NA NA NA UE
9 c3 3 NA NA NA USB
CodePudding user response:
I don't think your output is correct, it can be shrunk further, using reshape2
library(reshape2)
dcast(df,B C~A,value.var="C")
B C EA32 EA34 EA67 EA69
1 c1 UD <NA> <NA> UD <NA>
2 c1 USB USB <NA> USB <NA>
3 c1 UZ <NA> UZ <NA> <NA>
4 c2 UA UA UA UA UA
5 c2 UE <NA> <NA> UE UE
6 c2 USB <NA> USB <NA> <NA>
7 c3 UA UA <NA> <NA> <NA>
8 c3 UE <NA> <NA> <NA> UE
9 c3 USB <NA> <NA> <NA> USB