Home > other >  Substitution of strings results in incorrect names
Substitution of strings results in incorrect names

Time:04-02

I,d like to change several strings in vector. In my case, I have in all.images object:

# Original character's list
all.images <-c("S2B2A_20171003_124_IndianaIIPR00911120170922_BOA_10.tif",             
"S2B2A_20181028_124_IndianaIIPR0065820181024_BOA_10.tif",              
"S2B2A_20170715_124_SantaMariaCalcasPR0033420170731_BOA_10.tif",       
"S2B2A_20180928_124_NSraAparecidaBortolettoPR0042720180912_BOA_10.tif",
"S2A2A_20170610_124_LagoaAmarelaPR0022020170619_BOA_10.tif",           
"S2A2A_20160705_124_AguaSumidaPR001320160629_BOA_10.tif",              
"S2A2A_20181023_124_SaoPedroGabrielGarciaPR001720181031_BOA_10.tif",   
"S2B2A_20180908_124_NSraAparecidaBortolettoPR001920180911_BOA_10.tif", 
"S2A2A_20180824_124_NSraAparecidaBortolettoPR0043320180911_BOA_10.tif",
"S2A2A_20170720_124_VoAnaPR001520170802_BOA_10.tif",                   
"S2B2A_20180322_124_SaoMateusPR0021920180314_BOA_10.tif",              
"S2A2A_20181212_124_NSradeFatimaJoaoBatistaPR002320181128_BOA_10.tif", 
"S2A2A_20180413_081_SantaFeSebastiaoFogacaPR0021920180427_BOA_10.tif", 
"S2B2A_20170913_124_PerdizesPR0034920170905_BOA_10.tif",               
"S2A2A_20170610_124_TresMeninasPR001820170601_BOA_10.tif",             
"S2B2A_20180428_081_SantaFeSebastiaoFogacaPR0021020180501_BOA_10.tif", 
"S2B2A_20180508_081_SantaFeSebastiaoFogacaPR0022320180427_BOA_10.tif", 
"S2A2A_20170809_124_VoAnaPR001620170803_BOA_10.tif",                   
"S2B2A_20180819_124_PontalIIPR0012220180801_BOA_10.tif",               
"S2B2A_20181214_081_NSradeFatimaJoaoBatistaPR002320181128_BOA_10.tif", 
"S2A2A_20180423_081_SantaFeSebastiaoFogacaPR0033920180427_BOA_10.tif", 
"S2A2A_20180814_124_PontalIIPR0012220180801_BOA_10.tif",               
"S2B2A_20170715_124_VoAnaPR0015A20170803_BOA_10.tif",                  
"S2A2A_20160615_124_AguaSumidaPR0011220160627_BOA_10.tif",            
"S2A2A_20170720_124_SantaMariaCalcasPR0022820170726_BOA_10.tif",       
"S2A2A_20180913_124_SantaMariaCalcasPR001620180829_BOA_10.tif",        
"S2B2A_20170804_124_NSraAparecidaBortolettoPR0035720170811_BOA_10.tif",
"S2A2A_20170809_124_SantaFeBaracatPR001920170801_BOA_10.tif",          
"S2B2A_20180322_124_NSradeFatimaGlebaAPR001320180403_BOA_10.tif",      
"S2B2A_20180508_081_SantaFeSebastiaoFogacaPR0021920180427_BOA_10.tif")
# 

My idea is 1) remove S2B2A_ and _BOA_10.tif; 2) After S2B2A_ convert the 8 values into dates (e.g. 2017-09-05); 3) After the dates take the next three values to the end (eg. 124 or 081); and 4) Separate the characters based in capital letters and dates (eg. AguaSumidaPR0011220160627 to AguaSumida-PR00112-2016-06-27). But when I try to do:

sub("^\\w _(\\d )_(\\d )_([A-Za-z] )([A-Z]{2}\\d{3})(\\d)(\\d{4})(\\d{2})(\\d )_.*", 
     "\\3_\\4_\\5_\\6-\\7-\\8_\\1_\\2", all.images)
    
[1] "IndianaII_PR009_1_1120-17-0922_20171003_124"             
[2] "IndianaII_PR006_5_8201-81-024_20181028_124"              
 ...
[28] "SantaFeBaracat_PR001_9_2017-08-01_20170809_124"          
[29] "NSradeFatimaGlebaA_PR001_3_2018-04-03_20180322_124"      
[30] "SantaFeSebastiaoFogaca_PR002_1_9201-80-427_20180508_081" 

I have incorrected dates (eg. in [30] 9201-80-427_20180508_081) and my desirable output needs to be:

[1] "IndianaII_PR009111_2017-09-22_2017-10-03_124"             
[2] "IndianaII_PR00658_2018-10-24_2018-10-28_124"              
 ...
[28] "SantaFeBaracat_PR0019_2017-08-01_2017-08-09_124"          
[29] "NSradeFatimaGlebaA_PR0013_2018-04-03_2018-03-22_124"      
[30] "SantaFeSebastiaoFogaca_PR00219_2018-04-27_2018-05-08_081"

Please any help with it?

CodePudding user response:

I think this handles those exceptions in the comments on your answer using look ahead:

sub("^\\w _(\\d{4})(\\d{2})(\\d{2})_(\\d )_([A-Za-z] )([A-Z]{2}\\w )(?=\\d{8}) (\\d{4})(\\d{2})(\\d )_.*", 
    "\\5_\\6_\\7-\\8-\\9_\\1-\\2-\\3_\\4", all.images, perl = TRUE)
  • Related