Home > Enterprise >  How to remove the prefix of each sample
How to remove the prefix of each sample

Time:07-04

I was stuck in removing the prefix of each sample. I have tried to remove all the number within the sample, but this could not be a good way for grouping. I would like to only keep the sample name as the last two suffix. ( For example: AAP-L ) The details are list as below. Thank you in advance!

geo$pd$title [1] "AAB-HT002-AAP-L" "AAB-HT003-AAP-L" "AAB-HT006-AAP-L" "AAB-HT002-AAP-NL"
[5] "AAB-HT003-AAP-NL" "AAB-HT006-AAP-NL" "AAB-C007-AU-L" "AAB-HT001-AT-L"
[9] "AAB-N-C021-Normal-NC" "AAB-N-C022-Normal-NC" "AAB-C024-Normal-NC" "AAB-N-C025-Normal-NC" [13] "AAB-HT010-AAP.T-L" "AAB-HT011-AAP-L" "AAB-HT012-AAP-L" "AAB-HT010-AAP.T-NL"
[17] "AAB-HT011-AAP-NL" "AAB-HT012-AAP-NL" "AAB-C013-AU-L" "AAB-C033-AU-L"
[21] "AAB-C037-AT-L" "AAB-C043-AU-L" "AAB-HT041-AU-L" "AAB-N-C026-Normal-NC" [25] "AAB-N-C027-Normal-NC" "AAB-N-C028-Normal-NC" "AAB-N-C029-Normal-NC" "AAB-C014-AAP-L"
[29] "AAB-HT017-AAP.T-L" "AAB-HT018-AAP-L" "AAB-C014-AAP-NL" "AAB-HT017-AAP.T-NL"
[33] "AAB-HT018-AAP-NL" "AAB-C047-AT-L" "AAB-M044-AU-L" "AAB-N-C030-Normal-NC" [37] "AAB-N-C032-Normal-NC" "AAB-N-C034-Normal-NC" "AAB-N-C035-Normal-NC" "AAB-C020-AAP.T-L"
[41] "AAB-C038-AAP-L" "AABM046-AAP-L" "AAB-C020-AAP.T-NL" "AABM046-AAP-NL"
[45] "AAB-C048-AT-L" "AAB-HT050-AT-L" "AAB-M-060-AU-L" "AAB-M-061-AU-L"
[49] "AAB-N-C036-Normal-NC" "AAB-N-C039-Normal-NC" "AAB-N-C042-Normal-NC" "AAB-N-C045-Normal-NC" [53] "AAB-C052-AAP-L" "AAB-C076-AAP-L" "AAB-M056-AAP-L" "AAB-M058-AAP-L"
[57] "AAB-C052-AAP-NL" "AAB-C076-AAP-NL" "AAB-M056-AAP-NL" "AAB-M058-AAP-NL"
[61] "AAB-HT077-AU-L" "AAB-HT082-AU-L" "AAB-M080-AU-L" "AAB-N-C054-Normal-NC" [65] "AAB-N-C055-Normal-NC" "AAB-N-C059-Normal-NC" "AAB-N-C062-Normal-NC" "AAB-C083-AAP-L"
[69] "AAB-HT009-AAP-L" "AAB-HT079-AAP-L" "AAB-SF086-AAP-L" "AAB-C083-AAP-NL"
[73] "AAB-HT079-AAP-NL" "AAB-SF086-AAP-NL" "AAB-C016-AU-L" "AAB-HT008-AU-L"
[77] "AAB-HT091-AT-L" "AAB-SF087-AU-L" "AAB-N-C063-Normal-NC" "AAB-N-C064-Normal-NC" [81] "AAB-N-C065-Normal-NC" "AAB-HT103-AAP-L" "AAB-SF078-AAP.T-L" "AAB-SF099-AAP-L"
[85] "AAB-HT103-AAP-NL" "AAB-SF078-AAP.T-NL" "AAB-SF099-AAP-NL" "AAB-HT096-AT-L"
[89] "AAB-M094-AU-L" "AAB-SF089-AU-L" "AAB-SF090-AU-L" "AAB-SF100-AU-L"
[93] "AAB-N-C069-Normal-NC" "AAB-N-C070-Normal-NC" "AAB-N-C071-Normal-NC" "AAB-N-C072-Normal-NC" [97] "AAB-N-C074-Normal-NC" "AAB-N-C075-Normal-NC" "AAB-N-C085-Normal-NC" "AAB-C092-Normal-NC"
[101] "AAB-M112-AAP-L" "AAB-SF104-AAP-L" "AAB-SF114-AAP-L" "AAB-SF115-AAP.T-L"
[105] "AAB-M112-AAP-NL" "AAB-SF104-AAP-NL" "AAB-SF114-AAP-NL" "AAB-SF115-AAP.T-NL"
[109] "AAB-C109-AU-L" "AAB-C111-AU-L" "AAB-HT101-AU-L" "AAB-M110-AT-L"
[113] "AAB-SF106-AU-L" "AAB-SF113-AU-L" "AAB-N-C098-Normal-NC" "AAB-N-C105-Normal-NC" [117] "AAB-N-C107-Normal-NC" "AAB-N-C108-Normal-NC" "AAB-HT095-AAP.T-L" "AAB-HT095-AAP.T-NL"
[121] "AAB-HT097-AT-L" "AAB-C093-Normal-NC"

CodePudding user response:

Try this:

library(stringr)

# test data:
string <- c("AAB-HT002-AAP-L", "AAB-HT017-AAP.T-L", "AAB-HT003-AAP-L", "AAB-HT006-AAP-L", "AAB-HT002-AAP-NL")

str_split_fixed(string, '-', n=3)[, 3]

# output:
[1] "AAP-L"   "AAP.T-L" "AAP-L"   "AAP-L"   "AAP-NL" 

CodePudding user response:

This will deliver the terminal (alpha period)-dash-(alpha period)-end components.

titles <-c("AAB-HT002-AAP-L", "AAB-HT003-AA.P-L", "AAB-HT006-AAP-L", "AAB-HT002-AA.P-NL")

sub( "(. )([-])([[:alpha:].] [-][[:alpha:].] $)", "\\3", titles)
[1] "AAP-L"   "AA.P-L"  "AAP-L"   "AA.P-NL"

CodePudding user response:

We could use

library(stringr)
 str_remove(string, ".*\\d -")
[1] "AAP-L"   "AAP.T-L" "AAP-L"   "AAP-L"   "AAP-NL" 
  • Related