I a very newbie with R an i'm stuck.
I have a dataframe with a fixed-width part and a variable part like this:
df <- data.frame(
var1 = c('a','b','c'),
var2 = c('d','e','f'),
freqA = c(3,1,0),
freqB = c(0,2,1),
R1 = c('A1A2A3 ','A1Ba1Ba2 ','Ba1 '))
Which gives this presentation:
>df
|var1|var2|freqA|freqB| R1|
| a| d| 3| 0|A1A2A3 |
| b| e| 1| 2|A1Ba1Ba2 |
| c| f| 0| 1|Ba1 |
For each lines:
freqA gives you the amount of "A." in column R1, and "A_" is on 2 characters.
freqB gives you the amount of "B.." in column R1, and "B__" is on 3 characters.
I would like to split and organise the data in R1 column into 2 differents dataframe like this:
df_freqA
|var1|R1|R2|R3|
| a|A1|A2|A3|
| b|A1| | |
| c| | | |
and
df_freqB
|var1| R1| R2| R3|
| a| | | |
| b|Ba1|Ba2| |
| c|Ba1| | |
I've tried some functions of stringr and dplyr with positions arguments, i can't get it work. Also tried rep() function with error : invalid 'times' argument.
Anyhelp will be very appreciated :)
CodePudding user response:
You can try,
library(plyr)
library(dplyr)
library(stringr)
df1 = str_extract_all(df$R1, "[A-Z] [0-9] " ) %>% ldply(rbind)
df1 = `colnames<-`(df1, paste0('R', seq_len(ncol(aa))))
cbind(df[,1:3], df1)
var1 var2 freqA R1 R2 R3
1 a d 3 A1 A2 A3
2 b e 1 A1 <NA> <NA>
3 c f 0 <NA> <NA> <NA>
df2 = str_extract_all(df$R1, "[A-Z] [a-z] [0-9] " )%>% ldply(rbind)
df2 = `colnames<-`(df2, paste0('R', seq_len(ncol(df2))))
cbind(df[,1:3], df2)
var1 var2 freqA R1 R2
1 a d 3 <NA> <NA>
2 b e 1 Ba1 Ba2
3 c f 0 Ba1 <NA>