Home > Software engineering >  When using sapply,I get Error in str2lang(x) : <text>:1:31: unexpected symbol 1 ^
When using sapply,I get Error in str2lang(x) : <text>:1:31: unexpected symbol 1 ^

Time:12-12

When run this code,I will get an error:

genes<-colnames(survdata)[-c(1:3)]
univ_formulas<-sapply(genes,function(x)as.formula(paste('Surv(OS,status)~',x)))
Error in str2lang(x) : <text>:1:31: unexpected symbol
1: Surv(OS,status)~ ABC7-42389800N19.1
                                  ^

If I remove the element and run the code again, a similar error appears again:

univ_formulas<-sapply(genes,function(x)as.formula(paste('Surv(OS,status)~',x)))
Error in str2lang(x) : <text>:1:26: unexpected symbol
1: Surv(OS,status)~ CITF22-1A6.3
                             ^

I don't know where the wrong is.

example of the data:

head(genes,n = 50)
 [1] "A1BG"               "A1BG-AS1"           "A2M"               
 [4] "A2M-AS1"            "A2ML1"              "A2MP1"             
 [7] "A3GALT2"            "A4GALT"             "AAAS"              
[10] "AACS"               "AACSP1"             "AADAT"             
[13] "AAED1"              "AAGAB"              "AAK1"              
[16] "AAMDC"              "AAMP"               "AANAT"             
[19] "AAR2"               "AARD"               "AARS"              
[22] "AARS2"              "AARSD1"             "AASDH"             
[25] "AASDHPPT"           "AASS"               "AATF"              
[28] "AATK"               "AATK-AS1"           "ABAT"              
[31] "ABC7-42389800N19.1" "ABCA1"              "ABCA10"            
[34] "ABCA11P"            "ABCA12"             "ABCA13"            
[37] "ABCA17P"            "ABCA2"              "ABCA3"             
[40] "ABCA4"              "ABCA5"              "ABCA6"             
[43] "ABCA7"              "ABCA8"              "ABCA9"             
[46] "ABCB1"              "ABCB10"             "ABCB4"             
[49] "ABCB6"              "ABCB7"   

      

CodePudding user response:

This is because the names of the genes contain - which base::str2lang regards as a mathematical expression. We can fix this as follows:

  • "Clean" gene names to convert - to _ and document this somewhere.

We then have:

genes <- c("ABC7-42389800N19.1", "AATK-AS1")
sapply(genes,function(x)as.formula(paste('Surv(OS,status)~',
                                           sub("-", "_",x))))
$`ABC7-42389800N19.1`
Surv(OS, status) ~ ABC7_42389800N19.1
<environment: 0x000002ad508b58e8>

$`AATK-AS1`
Surv(OS, status) ~ AATK_AS1
<environment: 0x000002ad508b3c30>

This is an illustration of why that is the case:

A <- 4; B<- 20
str2lang("A-B")
A - B
eval(str2lang("A-B"))
[1] -16

str2lang is essentially similar to the dreaded eval-parse framework. From the docs, this is what it does:

str2expression(s) and str2lang(s) return special versions of parse(text=s, keep.source=FALSE) and can therefore be regarded as transforming character strings s to expressions, calls, etc.

NOTE

  1. Since this is to be used in modeling, it is probably better to perform the sub at the colnames stage such that the input data to the model has the names we expect:
# not tested but you get the idea
colnames(survdata)[-c(1:3)]<-sub("-", "_",colnames(survdata)[-c(1:3)])
  1. It is important, for biological/research purposes, to document why gene names where cleaned as suggested in this answer.
  • Related