Home > Software engineering >  Separate 2 differents variables from a text column
Separate 2 differents variables from a text column

Time:11-25

I a very newbie with R an i'm stuck.

I have a dataframe with a fixed-width part and a variable part like this:

df <- data.frame(
 var1 =  c('a','b','c'), 
 var2 =  c('d','e','f'),
 freqA = c(3,1,0),
 freqB = c(0,2,1),
 R1 =    c('A1A2A3    ','A1Ba1Ba2  ','Ba1       '))

Which gives this presentation:

>df
|var1|var2|freqA|freqB|        R1|
|   a|   d|    3|    0|A1A2A3    |
|   b|   e|    1|    2|A1Ba1Ba2  |
|   c|   f|    0|    1|Ba1       |

For each lines:

  • freqA gives you the amount of "A." in column R1, and "A_" is on 2 characters.

  • freqB gives you the amount of "B.." in column R1, and "B__" is on 3 characters.

I would like to split and organise the data in R1 column into 2 differents dataframe like this:

df_freqA
    |var1|R1|R2|R3|
    |   a|A1|A2|A3|
    |   b|A1|  |  |
    |   c|  |  |  |

and

df_freqB
    |var1| R1| R2| R3|
    |   a|   |   |   |
    |   b|Ba1|Ba2|   |
    |   c|Ba1|   |   |

I've tried some functions of stringr and dplyr with positions arguments, i can't get it work. Also tried rep() function with error : invalid 'times' argument.

Anyhelp will be very appreciated :)

CodePudding user response:

You can try,

library(plyr)
library(dplyr)
library(stringr)

df1 = str_extract_all(df$R1, "[A-Z] [0-9] " ) %>% ldply(rbind) 
df1 = `colnames<-`(df1, paste0('R', seq_len(ncol(aa))))
cbind(df[,1:3], df1)
  var1 var2 freqA   R1   R2   R3
1    a    d     3   A1   A2   A3
2    b    e     1   A1 <NA> <NA>
3    c    f     0 <NA> <NA> <NA>


df2 = str_extract_all(df$R1, "[A-Z] [a-z] [0-9] " )%>% ldply(rbind) 
df2 = `colnames<-`(df2, paste0('R', seq_len(ncol(df2))))
cbind(df[,1:3], df2)
  var1 var2 freqA   R1   R2
1    a    d     3 <NA> <NA>
2    b    e     1  Ba1  Ba2
3    c    f     0  Ba1 <NA>
  •  Tags:  
  • r
  • Related