My dataset has about 14 columns which look like this
X1 | X2 | X3 | X4 |
---|---|---|---|
1,2 | 1 | 2,3 | 1,2,3 |
1 | 2 | 3 | 1,3 |
I want to separate each column and the resulting data will look like this:
X1_1 | X1_2 | X1_3 | X2_1 | X2_2 | X2_3 | X3_1 | X3_2 | X3_3 | X4_1 | X4_2 | X4_3 |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | NA | 1 | NA | NA | 2 | 3 | NA | 1 | 2 | 3 |
1 | NA | NA | 2 | NA | NA | 3 | NA | NA | 1 | 3 | NA |
While I can do this separately for each column one by one, its not the most efficient way. I want to know if it can be done faster for all columns in one go, using loop_if, regex, or mutate_at or any other way.
CodePudding user response:
library(tidyverse)
df %>%
mutate(across(everything(), str_split, ","))%>%
unnest_wider(everything(), names_sep = '_')
# A tibble: 2 × 8
X1_1 X1_2 X2_1 X3_1 X3_2 X4_1 X4_2 X4_3
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 2 1 2 3 1 2 3
2 1 NA 2 3 NA 1 3 NA
Note: This does not give you columns that are NA. eg X2_2
, you could manually add those. Although it is not ideal.
if of course you need the NA columns:
df %>%
rowid_to_column()%>%
pivot_longer(-rowid, values_transform = as.character)%>%
separate_rows(value, convert = TRUE)%>%
group_by(rowid, name)%>%
mutate(name1 = row_number())%>%
ungroup()%>%
complete(rowid, name, name1)%>%
pivot_wider(rowid, names_from = c(name, name1), values_from = value)
# A tibble: 2 × 13
rowid X1_1 X1_2 X1_3 X2_1 X2_2 X2_3 X3_1 X3_2 X3_3 X4_1 X4_2 X4_3
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 2 NA 1 NA NA 2 3 NA 1 2 3
2 2 1 NA NA 2 NA NA 3 NA NA 1 3 NA
CodePudding user response:
do.call(cbind, Map(read.table, text=sapply(df, paste, collapse='\n'), sep=',', fill =TRUE))
X1.V1 X1.V2 V1 X3.V1 X3.V2 X4.V1 X4.V2 X4.V3
1 1 2 1 2 3 1 2 3
2 1 NA 2 3 NA 1 3 NA
>