I have the following data set:
>data_short
Symbol_ID GFP_Mean GFP_SD Cells
<chr> <dbl> <dbl> <dbl>
1 Control_0 0.0303 0.00657 7071.
2 XRCC4_7518 0.0396 0.00768 5022
3 XRCC5_7520 0.0305 0.00629 5781.
4 BRCA1_672 0.0178 0.00833 1822.
5 DDX48_9775 0.109 0.0201 239
6 HMGN1_3150 0.0997 0.00875 1173
7 PRDM13_59336 0.0789 0.00794 980
8 UBOX5_22888 0.0734 0.00653 1378
9 HIST1H2AE_3012 0.0719 0.00592 1906
10 HMGN2_3151 0.0691 0.00934 738
I try to split the first column into 2 different columns and it seems to work well
data_short<-data_short %>% mutate(Symbol_ID=str_split_fixed(data_short$Symbol_ID, "_", 2))
Symbol_ID[,1] [,2] GFP_Mean GFP_SD Cells
<chr> <chr> <dbl> <dbl> <dbl>
1 Control 0 0.0303 0.00657 7071.
2 XRCC4 7518 0.0396 0.00768 5022
3 XRCC5 7520 0.0305 0.00629 5781.
4 BRCA1 672 0.0178 0.00833 1822.
5 DDX48 9775 0.109 0.0201 239
6 HMGN1 3150 0.0997 0.00875 1173
7 PRDM13 59336 0.0789 0.00794 980
8 UBOX5 22888 0.0734 0.00653 1378
9 HIST1H2AE 3012 0.0719 0.00592 1906
10 HMGN2 3151 0.0691 0.00934 738
But when I check the str(data_short)
it seems like it didn't work well...:
> str(data_short)
tibble [1,177 × 4] (S3: tbl_df/tbl/data.frame)
$ Symbol_ID: chr [1:1177, 1:2] "Control" "XRCC4" "XRCC5" "BRCA1" ...
$ GFP_Mean : num [1:1177] 0.0303 0.0396 0.0305 0.0178 0.1088 ...
$ GFP_SD : num [1:1177] 0.00657 0.00768 0.00629 0.00833 0.02014 ...
$ Cells : num [1:1177] 7071 5022 5781 1822 239 ...
Why is that? how can I fix it? Thanks in advance!
CodePudding user response:
str_split_fixed
outputs a character matrix so isn't ideal for working with dataframe columns. tidyr::separate
would be more suitable in this case e.g.
data_short %>%
tidyr::separate(Symbol_ID, into = c("SymbolID1", "SymbolID2"), sep = "_")