Home > OS >  split column in R data.frame
split column in R data.frame

Time:05-23

the data.frame is like:

|col_1|
|------|
|A |
|B |
|B |
|A |
|A |

The expected output should be:

|col_A    |col_B |
|---------|-------|
|A        |NA |
|NA       |B |
|A        |NA |
|NA       |B |
|A        |NA |
|A        |NA |

I can do this manually in excel but is there a concise way in R to get this done?

CodePudding user response:

Using tidyverse you could try the following. Add row numbers to make each row unique, and then pivot_wider to put data into wide format.

library(tidyverse)

df %>%
  mutate(rn = row_number()) %>%
  pivot_wider(values_from = col_1, names_from = col_1, names_prefix = "col_") %>%
  select(-rn)

Output

  col_A col_B
  <chr> <chr>
1 A     NA   
2 NA    B    
3 NA    B    
4 A     NA   
5 A     NA

CodePudding user response:

We can use dcast

library(data.table)
dcast(setDT(df1), seq_along(col_1) ~ paste0('col_', col_1), 
   value.var = 'col_1')[, .(col_A, col_B)]

-output

   col_A  col_B
   <char> <char>
1:      A   <NA>
2:   <NA>      B
3:   <NA>      B
4:      A   <NA>
5:      A   <NA>

Or with base R

out <- model.matrix(~ col_1-1, df1)
out[] <- unique(df1$col_1)[(NA^!(out)) * col(out)]
out <- as.data.frame(out)

data

df1 <- structure(list(col_1 = c("A", "B", "B", "A", "A")),
 class = "data.frame", row.names = c(NA, 
-5L))
  • Related